This course is an introduction to key mathematical concepts at the heart of machine learning. The focus is on matrix methods and statistical models and features real-world applications ranging from classification and clustering to denoising and recommender systems. Mathematical topics covered include linear equations, regression, regularization, the singular value decomposition, iterative optimization algorithms, and probabilistic models. Machine learning topics include the LASSO, support vector machines, kernel methods, clustering, dictionary learning, neural networks, and deep learning. Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g. Matlab, Python, Julia, or R). Knowledge of linear algebra and statistics is not assumed.
Appropriate for graduate students or advanced undergraduates. This course could be used a precursor to TTIC 31020, “Introduction to Machine Learning” or CSMC 35400.
Class place and time: Mondays and Wednesdays, 3-4:15pm
Instructor: Rebecca Willett
Office: 321 Crerar
Office hours: Mondays, 1:30-2:30pm when classes are in session
TAs: Zewei Chu, Alexander Hoover, Nathan Mull, Christopher Jones
Grader: Owen Melia
Email policy: The TAs and I will prioritize answering questions posted to Piazza, NOT individual emails.
Prerequisites: Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g. Matlab, Python, Julia, or R).
- Matrix Methods in Data Mining and Pattern Recognition by Lars Elden.
- Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares by Stephen Boyd and Lieven Vandenberghe
- Pattern Recognition and Machine Learning by Christopher Bishop — optional
- The textbooks will be supplemented with additional notes and readings.
Graduate and undergraduate students will be expected to perform at the graduate level and will be evaluated equally. All students will be evaluated by regular homework assignments, quizzes, and exams. The final grade will be allocated to the different components as follows:
Homework: 30%. There are roughly weekly homework assignments (about 8 total). Homework problems include both mathematical derivations and proofs as well as more applied problems that involve writing code and working with real or synthetic data sets.
Exams: 40%. Two exams (20% each).
Midterm: Wednesday, Feb. 6, 6-8pm in KPTC 120
Final: Wednesday, March 13, 6-8pm in KPTC 120
Quizzes: 30%. Quizzes will be via canvas and cover material from the past few lectures.
Letter grades will be assigned using the following hard cutoffs:
A: 93% or higher
A-: 90% or higher
B+: 87% or higher
B: 83% or higher
B-: 80% or higher
C+: 77% or higher
C: 60% or higher
D: 50% or higher
F: less than 50%
We reserve the right to curve the grades, but only in a fashion that would improve the grade earned by the stated rubric.
Homework and quiz policy: Your lowest quiz score and your lowest homework score will not be counted towards your final grade. This policy allows you to miss class during a quiz or miss an assignment, but only one each. Plan accordingly.
Tentative lecture schedule:
Weeks 1-2: Intro and Linear Models
What is ML, how is it related to other disciplines?
Learning goals and course objectives.
Vectors and matrices in machine learning models
Features and models
Least squares, linear independence and orthogonality
Loss, risk, generalization
Applications: bioinformatics, face recognition
Week 3: Singular Value Decomposition (Principal Component Analysis)
Applications: recommender systems, PageRank
Week 4: Overfitting and Regularization
The Lasso and proximal point algorithms
Model selection, cross-validation
Applications: image deblurring, compressed sensing
Weeks 5-6: Beyond Least Squares: Alternate Loss Functions
Feature functions and nonlinear regression and classification
Kernel methods and support vector machines
Application: Handwritten digit classification
Week 7: Iterative Methods
Stochastic Gradient Descent (SGD)
Neural networks and backpropagation
Week 8: Statistical Models
Density estimation and maximum likelihood estimation
Gaussian mixture models and Expectation Maximization
Unsupervised learning and clustering
Application: text classification
Week 9: Ensemble Methods
Random forests, bagging
Application: electronic health record analysis