Mathematical Foundations of Machine Learning (Fall 2021)

Computer Science 25300 / 35300 & Statistics 27700

This course is an introduction to key mathematical concepts at the heart of machine learning. The focus is on matrix methods and statistical models and features real-world applications ranging from classification and clustering to denoising and recommender systems. Mathematical topics covered include linear equations, matrix rank, subspaces, regression, regularization, the singular value decomposition, and iterative optimization algorithms. Machine learning topics include least squares classification and regression, ridge regression, principal components analysis, principal components regression, kernel methods, matrix completion, support vector machines, clustering, stochastic gradient descent, neural networks, and deep learning. Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g. Matlab, Python, Julia, or R). Knowledge of linear algebra and statistics is not assumed.

Appropriate for graduate students or advanced undergraduates. This course could be used as a precursor to TTIC 31020, “Introduction to Machine Learning” or CSMC 35400.


Class place and time:

  • Tuesdays and Thursdays, 11am-12:20pm and 2-3:20pm, Ryerson 251
  • In-person attendance is optional. Recorded videos and lecture notes will be made available.

Ed Discussion:

This term, we will be using Ed Discussion for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TAs, and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Ed Discussion. Find our class page here.

Instructor: Rebecca Willett

TAs: Chih-chan Tien, Tapan Srivastava, Zhuokai Zhao, Zixin Ding, Xiaoan Ding, Carlo Siebenschuh, Zhisheng Xiao, Xialiang Dou

Graders: Annabelle (Sujun) Tang, Advait Ganapathy

Email policy: We will answer questions posted to Ed Discussion, not individual emails.

Office Hours:

Posted on canvas

Note: fundamental review sessions are not for homework help. The TA will clarify concept questions and/or go through materials from classes. Elena also offers fundamental reviews for those who are currently not in Chicago.


Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g. Matlab, Python, Julia, or R).



Grading Policy

All students will be evaluated by regular homework assignments, quizzes, and exams. The final grade will be allocated to the different components as follows:

Homework (50% UG, 40% G): There are roughly weekly homework assignments (about 8 total). Homework problems include both mathematical derivations and proofs as well as more applied problems that involve writing code and working with real or synthetic data sets.

Exams (40%): Two exams (20% each).
Midterm: TBD, around Oct. 30
Final: Tuesday, Dec. 8, 10:30 AM-12:30 PM

Quizzes (10%): Quizzes will be via canvas and cover material from the past few lectures.

Final project (grad students only, 10%)

Letter grades will be assigned using the following hard cutoffs:

A: 93% or higher
A-: 90% or higher
B+: 87% or higher
B: 83% or higher
B-: 80% or higher
C+: 77% or higher
C: 60% or higher
D: 50% or higher
F: less than 50%

We reserve the right to curve the grades, but only in a fashion that would improve the grade earned by the stated rubric.

Homework and quiz policy: Your lowest quiz score and your lowest homework score will not be counted towards your final grade. This policy allows you to miss class during a quiz or miss an assignment, but only one each. Plan accordingly.

Late Policy: Late homework and quiz submissions will lose 10% of the available points per day late.

Pass/Fail Grading: A grade of P is given only for work of C- quality or higher. You should make the request for Pass/Fail grading in writing (private note on Piazza). You must request Pass/Fail grading prior to the day of the final exam.

Lectures from past quarters:


Lecture 1: Introduction notes, video

Lecture 2: Vectors and Matrices notes, video 2019, video 2021

Lecture 3: Least Squares and Geometry notes, video 2019, video 2021

Lecture 4: Least Squares and Optimization notes, video 2019, video 2021,

Lecture 5: Gradient descent for least squares notes, video 2021

Lecture 6: Subspaces, Bases, and Projections notes, video 2019, video 2021

Lecture 7: Finding Orthogonal Bases notes, video 2019, video 2021

Lecture 8: Introduction to the Singular Value Decomposition notes, video, video 2021

Lecture 9: The Singular Value Decomposition notes, video, video 2021

Lecture 10: SVD, PCA, and Dimensionality Reduction notes, video, video 2021

Lecture 11: PCR & Ridge Regression notes, video, video 2021

Lecture 12: Bias in ML and Matrix Completion (with notes on PageRank) notes, bias slides, video, video 2021, video on matrix completion 2021

Lecture 13: Kernel Ridge Regression notes, video, video 2021

Lecture 14: Support Vector Machines notes, video, video 2021

Lecture 15: Stochastic Gradient Descent notes, video, video 2021

Lecture 16: Deeper Neural Networks video 2021,

Lecture 17: Backpropagation video, notes, video 2021

Lecture 18: Clustering and K-means notes, video, video 2021

Tentative schedule:

Weeks 1-2: Intro and Linear Models

What is ML, how is it related to other disciplines?
Learning goals and course objectives.
Vectors and matrices in machine learning models
Features and models
Least squares, linear independence and orthogonality
Linear classifiers
Loss, risk, generalization
Applications: bioinformatics, face recognition

Week 3: Singular Value Decomposition (Principal Component Analysis)

Dimensionality reduction
Applications: recommender systems, PageRank

Week 4: Overfitting and Regularization

Ridge regression
Model selection, cross-validation
Applications: image deblurring

Weeks 5-6: Beyond Least Squares: Alternate Loss Functions

Hinge loss
Logistic regression
Feature functions and nonlinear regression and classification
Kernel methods and support vector machines
Application: Handwritten digit classification

Week 7: Iterative Methods

Stochastic Gradient Descent (SGD)
Neural networks and backpropagation

Week 8: Statistical Models

Density estimation and maximum likelihood estimation
Gaussian mixture models and Expectation Maximization
Unsupervised learning and clustering
Application: text classification

Week 9: Ensemble Methods

Decision trees
Random forests, bagging
Application: electronic health record analysis