**Computer Science 25300 / 35300 & Statistics 27700**

This course is an introduction to key mathematical concepts at the heart of machine learning. The focus is on matrix methods and statistical models and features real-world applications ranging from classification and clustering to denoising and recommender systems. Mathematical topics covered include linear equations, matrix rank, subspaces, regression, regularization, the singular value decomposition, and iterative optimization algorithms. Machine learning topics include least squares classification and regression, ridge regression, principal components analysis, principal components regression, kernel methods, matrix completion, support vector machines, clustering, stochastic gradient descent, neural networks, and deep learning. Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g. Matlab, Python, Julia, or R). Knowledge of linear algebra and statistics is not assumed.

Appropriate for graduate students or advanced undergraduates. This course could be used as a precursor to TTIC 31020, “Introduction to Machine Learning” or CSMC 35400.

### Prerequisites:

Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g. Matlab, Python, Julia, or R).

### Textbooks:

- Matrix Methods in Data Mining and Pattern Recognition by Lars Elden.
- Elements of Statistical Learning, 12th printing Jan 2017 by Hastie, Tibshirani, and Friedman
- Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares by Stephen Boyd and Lieven Vandenberghe
- Pattern Recognition and Machine Learning by Christopher Bishop — optional

The textbooks will be supplemented with additional notes and readings.

### Evaluation:

**Grading Policy**

All students will be evaluated by regular homework assignments, quizzes, and exams. The final grade will be allocated to the different components as follows:

Homework (50% UG, 40% G): There are roughly weekly homework assignments (about 8 total). Homework problems include both mathematical derivations and proofs as well as more applied problems that involve writing code and working with real or synthetic data sets.

Exams (40%): Two exams (20% each).

Midterm: TBD, around Oct. 30

Final: Tuesday, Dec. 8,

Quizzes (10%): Quizzes will be via canvas and cover material from the past few lectures.

Final project (grad students only, 10%)

Letter grades will be assigned using the following hard cutoffs:

A: 93% or higher

A-: 90% or higher

B+: 87% or higher

B: 83% or higher

B-: 80% or higher

C+: 77% or higher

C: 60% or higher

D: 50% or higher

F: less than 50%

*Homework and quiz policy:* Your lowest quiz score and your lowest homework score will not be counted towards your final grade. This policy allows you to miss class during a quiz or miss an assignment, but only one each. Plan accordingly.

*Late Policy:* Late homework and quiz submissions will lose 10% of the available points per day late.

*Pass/Fail Grading: *A grade of P is given only for work of C- quality or higher. You should make the request for Pass/Fail grading in writing (private note on Piazza). You must request Pass/Fail grading *prior* to the day of the final exam.

## Lectures from past quarters:

### Written lecture notes from Fall 2023

Lecture 1: Introduction

Lecture 2: Vectors and Matrices

Lecture 3: Least Squares and Geometry

Lecture 4: Least Squares and Optimization

Lecture 5: Subspaces and Bases

Lecture 6: Orthogonal Bases

Lecture 7: Introduction to the Singular Value Decomposition

Lecture 8: The Singular Value Decomposition

Lecture 9: SVD in Machine Learning

Lecture 10: SVD in Least Squares

Lecture 11: Kernel Methods

Lecture 12: Support Vector Machines

Lecture 13: Stochastic Gradient Descent

Lecture 14-15: Backpropagation

Lecture 16: Clustering and K-means

Lecture 17: The Expectation-Maximization Algorithm

### Videos of past lectures (from 2020 and 2021, imperfectly aligned with most recent class notes)

Lecture 1: Introduction video

Lecture 2: Vectors and Matrices video 2019, video 2021

Lecture 3: Least Squares and Geometry video 2019, video 2021

Lecture 4: Least Squares and Optimization video 2019, video 2021,

Lecture 5: Subspaces and Bases video 2021

Lecture 6: Subspaces, Bases, and Projections video 2019, video 2021

Lecture 7: Finding Orthogonal Bases video 2019, video 2021

Lecture 8: Introduction to the Singular Value Decomposition video, video 2021

Lecture 9: The Singular Value Decomposition video, video 2021

Lecture 10: SVD, PCA, and Dimensionality Reduction video, video 2021

Lecture 11: PCR & Ridge Regression video, video 2021

Lecture 12: Bias in ML and Matrix Completion (with notes on PageRank) video, video 2021, video on matrix completion 2021

Lecture 13: Kernel Ridge Regression video, video 2021

Lecture 14: Support Vector Machines video, video 2021

Lecture 15: Stochastic Gradient Descent video, video 2021

Lecture 16: Deeper Neural Networks video 2021,

Lecture 17: Backpropagation video, video 2021

Lecture 18: Clustering and K-means video, video 2021

## Rough schedule:

**Weeks 1-2:** **Intro and Linear Models **

What is ML, how is it related to other disciplines?

Learning goals and course objectives.

Vectors and matrices in machine learning models

Features and models

Least squares, linear independence and orthogonality

Linear classifiers

Loss, risk, generalization

Applications: bioinformatics, face recognition

**Week 3:** **Singular Value Decomposition (Principal Component Analysis)**

Dimensionality reduction

Applications: recommender systems, PageRank

**Week 4:** **Overfitting and Regularization**

Ridge regression

Model selection, cross-validation

Applications: image deblurring

**Weeks 5-6:** **Beyond Least Squares: Alternate Loss Functions**

Hinge loss

Logistic regression

Feature functions and nonlinear regression and classification

Kernel methods and support vector machines

Application: Handwritten digit classification

**Week 7:** **Iterative Methods**

Stochastic Gradient Descent (SGD)

Neural networks and backpropagation

**Week 8:** **Statistical Models **

Density estimation and maximum likelihood estimation

Gaussian mixture models and Expectation Maximization

Unsupervised learning and clustering

Application: text classification

**Week 9:** **Ensemble Methods**

AdaBoost

Decision trees

Random forests, bagging

Application: electronic health record analysis