**Fall 2019 Computer Science 25300 / 35300 & Statistics 27700: Mathematical Foundations of Machine Learning**

### Outline:

This course is an introduction to key mathematical concepts at the heart of machine learning. The focus is on matrix methods and statistical models and features real-world applications ranging from classification and clustering to denoising and recommender systems. Mathematical topics covered include linear equations, regression, regularization, the singular value decomposition, iterative optimization algorithms, and probabilistic models. Machine learning topics include the LASSO, support vector machines, kernel methods, clustering, dictionary learning, neural networks, and deep learning. Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g. Matlab, Python, Julia, or R). Knowledge of linear algebra and statistics is not assumed.

Appropriate for graduate students or advanced undergraduates. This course could be used a precursor to TTIC 31020, “Introduction to Machine Learning” or CSMC 35400.

## Lectures:

Lecture 1: Introduction notes, video

Lecure 2: Vectors and matrices in machine learning notes, video

Lecture 3: Least squares and geometry notes, video

Lecture 4: Least squares and optimization notes, video

Lecture 5: Subspaces, bases, and projections notes, video

Lecture 6: Finding orthogonal bases notes, video

Lecture 7: Introduction to the Singular Value Decomposition notes video

Lecture 8: The Singular Value Decomposition notes video

Lecture 9: The SVD in Machine Learning notes video

Lecture 10: More on the SVD in Machine Learning (including matrix completion) notes video

Lecture 11: PageRank and Ridge Regression notes video

Lecture 12: Kernel Ridge Regression notes video

Lecture 13: Support Vector Machines notes video

Lecture 14: Basic Convex Optimization notes video

Lectures 15-16: Stochastic gradient descent and neural networks video 1, video 2

Lecture 17: Clustering and K-means notes video

### Logistics:

Class place and time:

- Mondays and Wednesdays, 9-10:20am in Crerar 011
- Mondays and Wednesdays, 3-4:15pm in Ryerson 251

*Piazza*: (Links to an external site.)

This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TAs, and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com. Find our class page at: https://piazza.com/uchicago/fall2019/cmsc2530035300stat27700/home (Links to an external site.) (Links to an external site.)

*Course Website: *https://willett.psd.uchicago.edu/teaching/fall-2019-mathematical-foundations-of-machine-learning/

Instructor: Rebecca Willett

*TAs*:

Ruoxi (Roxie) Jiang (Head TA), Lang Yu, Zhuokai Zhao, Yuhao Zhou, Takintayo (Tayo) Akinbiyi, Bumeng Zhuo

*Graders*: Pranav Nanga, Blake Anderson

*Email policy*: We will prioritize answering questions posted to Piazza, **not **individual emails.

### Office Hours

Becca: Wednesdays 10:30-11:30AM, JCL 257, starting week of Oct. 7.

Lang and Roxie: Tuesdays 12:30 pm to 1:30pm, Crerar 298 (there will be slight changes for 2^{nd} week and 4^{th} week, i.e., Oct. 8^{th} and Oct. 22 due to the reservation problem, and will be updated on Canvas accordingly)

Tayo: Mondays 11am-12pm in Jones 304 (This session is NOT for homework help, but rather for additional help with lectures and fundamentals.)

Zhuokai: Mondays 11am to 12pm, Location TBD

Bumeng: Wednesdays 2pm to 3pm, Jones 304

Yuhao: Fridays 2-3pm in JCL 354

Prerequisites: Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g. Matlab, Python, Julia, or R).

### Textbooks:

Matrix Methods in Data Mining and Pattern Recognition by Lars Elden. (Links to an external site.)

Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares by Stephen Boyd and Lieven Vandenberghe (Links to an external site.)

Pattern Recognition and Machine Learning by Christopher Bishop (Links to an external site.) — optional

The textbooks will be supplemented with additional notes and readings.

### Evaluation:

**Grading Policy**

All students will be evaluated by regular homework assignments, quizzes, and exams. The final grade will be allocated to the different components as follows:

Homework (50% UG, 40% G): There are roughly weekly homework assignments (about 8 total). Homework problems include both mathematical derivations and proofs as well as more applied problems that involve writing code and working with real or synthetic data sets.

Exams (40%): Two exams (20% each).

Midterm: Wednesday, Oct. 30, 6-8pm, location TBD

Final: TBD

Quizzes (10%): Quizzes will be via canvas and cover material from the past few lectures.

Final project (grad students only, 10%)

Letter grades will be assigned using the following hard cutoffs:

A: 93% or higher

A-: 90% or higher

B+: 87% or higher

B: 83% or higher

B-: 80% or higher

C+: 77% or higher

C: 60% or higher

D: 50% or higher

F: less than 50%

We reserve the right to curve the grades, but only in a fashion that would improve the grade earned by the stated rubric.

*Homework and quiz policy:* Your lowest quiz score and your lowest homework score will not be counted towards your final grade. This policy allows you to miss class during a quiz or miss an assignment, but only one each. Plan accordingly.

*Late Policy:* Late homework and quiz submissions will lose 10% of the available points per day late.

*Pass/Fail Grading: *A grade of P is given only for work of C- quality or higher. You should make the request for Pass/Fail grading in writing (private note on Piazza). You must request Pass/Fail grading *prior* to the day of the final exam.

## Tentative schedule:

**Weeks 1-2:** **Intro and Linear Models **

What is ML, how is it related to other disciplines?

Learning goals and course objectives.

Vectors and matrices in machine learning models

Features and models

Least squares, linear independence and orthogonality

Linear classifiers

Loss, risk, generalization

Applications: bioinformatics, face recognition

**Week 3:** **Singular Value Decomposition (Principal Component Analysis)**

Dimensionality reduction

Applications: recommender systems, PageRank

**Week 4:** **Overfitting and Regularization**

Ridge regression

The Lasso and proximal point algorithms

Model selection, cross-validation

Applications: image deblurring, compressed sensing

**Weeks 5-6:** **Beyond Least Squares: Alternate Loss Functions**

Hinge loss

Logistic regression

Feature functions and nonlinear regression and classification

Kernel methods and support vector machines

Application: Handwritten digit classification

**Week 7:** **Iterative Methods**

Stochastic Gradient Descent (SGD)

Neural networks and backpropagation

**Week 8:** **Statistical Models **

Density estimation and maximum likelihood estimation

Gaussian mixture models and Expectation Maximization

Unsupervised learning and clustering

Application: text classification

**Week 9:** **Ensemble Methods**

AdaBoost

Decision trees

Random forests, bagging

Application: electronic health record analysis