Mathematical Foundations of Machine Learning

Computer Science 25300 / 35300 & Statistics 27700

This course is an introduction to key mathematical concepts at the heart of machine learning. The focus is on matrix methods and statistical models, and features real-world applications ranging from classification and clustering to denoising and recommender systems. Mathematical topics covered include linear equations, matrix rank, subspaces, regression, regularization, the singular value decomposition, and iterative optimization algorithms. Machine learning topics include least squares classification and regression, ridge regression, principal components analysis, principal components regression, kernel methods, matrix completion, support vector machines, clustering, stochastic gradient descent, neural networks, and deep learning. Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g., Matlab, Python, Julia, or R). Knowledge of linear algebra and statistics is not assumed.

Appropriate for graduate students or advanced undergraduates. This course could be used as a precursor to TTIC 31020, “Introduction to Machine Learning,” or CSMC 35400.

Prerequisites:

Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g., Matlab, Python, Julia, or R).

Textbooks:

Other resources:

Fall 2025

All course videos are being posted to a YouTube channel. The videos linked below are Panopto videos with (potentially faulty) captions.

Lectures from past quarters:

Written lecture notes from Fall 2023

Videos of past lectures (from 2020 and 2021, imperfectly aligned with most recent class notes)

Topics:

Intro and Linear Models

  • What is ML, how is it related to other disciplines?
  • Learning goals and course objectives.
  • Vectors and matrices in machine learning models
  • Features and models
  • Least squares, linear independence and orthogonality
  • Linear classifiers
  • Loss, risk, generalization
  • Applications: bioinformatics, face recognition

Singular Value Decomposition (Principal Component Analysis)

  • Dimensionality reduction
  • Applications: recommender systems, PageRank

Overfitting and Regularization

  • Ridge regression
  • Model selection, cross-validation
  • Applications: image deblurring

Beyond Least Squares: Alternate Loss Functions

  • Hinge loss
  • Logistic regression
  • Feature functions and nonlinear regression and classification
  • Kernel methods and support vector machines
  • Application: Handwritten digit classification

Iterative Methods

  • Stochastic Gradient Descent (SGD)
  • Neural networks and backpropagation

Statistical Models

  • Density estimation and maximum likelihood estimation
  • Gaussian mixture models and Expectation Maximization
  • Unsupervised learning and clustering
  • Application: text classification