CS 6966-001 Spring 2022 Theory of Machine Learning
Learning algorithms are ubiquitous, and are playing an ever increasing role in our daily lives. How are they different from "usual" algorithms? How can we reason about the ability of an algorithm to "learn from examples", and work with data it has never seen? We will introduce the formal notions of learnability, generalization, supervised vs. unsupervised learning, optimization, and so on. The goal is also to introduce the fundamental techniques and models that underlie most of today's ML applications.
Background and pre-requisites
In order to appreciate the content, some knowledge of basic machine learning methods is recommended. Background in basic algorithm design (CS 4150 or 5150), probability and linear algebra will also be assumed. We will routinely work with multivariate functions, their derivatives, etc., so a working knowledge of these ideas will be assumed.
Textbook/Resources
The main textbook for the course is Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David. A copy of the book (for personal use) is available online from the author's page.
Topics
Here is a rough list of topics that we will cover. For more details, see the Schedule below.
- Statistical learning theory: the PAC model
- VC dimension, generalization bounds, Rademacher complexity
- Convex optimization: gradient descent and its variants
- Online learning and boosting
- Neural networks: power and limitations
- Generalization in neural networks
- Unsupervised learning: clustering, generative models
Grading and Logistics
There will be four homeworks, one on each of the four main themes in the course (60%). There is also a course project (25%), which involves a 15-30 minute presentation on a topic of your choice from a list of candidate topics that will be provided half way into the semester. Finally, every student registered for the course is required to prepare high-quality scribe notes for one/two of the lectures (15%). These need to be turned in within 7 days of the lecture in order to receive full credit.
Lecture schedule
Here is a link to a folder that will store the recorded videos: [lecture videos]
For scribes. Here are the template files for your scribe notes. The zip file contains two files, the first is a template .tex file which should also serve as an example. The second is a 'cls' file, which contains information about the template. You only need to modify the .tex file. If you run into compilation issues, either as the TA, or create a new project on Overleaf and work there. Here is the sign-up sheet.
Week 1: Introduction, basics of PAC learning
Tuesday (1/11): Introduction and course overview [slides]
Thursday (1/13): The PAC model for supervised learning [slides (not annotated)] [slides (annotated)] [scribe notes]
Week 2: Generalization in the PAC model, growth functions and VC dimension
Tuesday (1/18): No free lunch theorem, PAC learning [slides (not annotated)] [slides (annotated)] [scribe notes]
Thursday (1/20): PAC learning (contd.), growth functions [slides (not annotated)] [slides (annotated)] [scribe notes]
Week 3: VC dimension (contd.)
Tuesday (1/25): Representative samples, growth function [slides (not annotated)] [slides (annotated)] [scribe notes]
Thursday (1/27): Growth function & VC dimension [(slides -- not annotated)][slides (annotated)] [scribe notes]
Week 4: Conclude Stat Learning Theory, Intro to Optimization
Tuesday (2/1): VC dimension, fundamental theorem [(slides -- not annotated)] [slides (annotated)] [scribe notes]
Thursday (2/3): Fundamental theorem, Intro to optimization [slides (not annotated)] [slides (annotated)] [scribe notes]
Week 5: Introduction to Optimization
Tuesday (2/8): Introduction to optimization, gradient descent [slides (not annotated)] [slides (annotated)] [scribe notes]
Thursday (2/10): Gradient descent -- vanilla analysis [slides (not annotated)] [slides (annotated)] [scribe notes]
Week 6: Optimization (continued)
Tuesday (2/15): Gradient descent, variants [slides (not annotated)] [slides (annotated)] [scribe notes]
Thursday (2/17): Stochastic gradient descent [slides (not annotated)] [slides (annotated)] [scribe notes]
Week 7: Optimization (continued)
Tuesday (2/22): More gradient descent [slides (not annotated)] [slides (annotated)] [scribe notes]
Thursday (2/24): Gradient descent for strongly convex functions [slides (not annotated)] [slides (annotated)][scribe notes]
Week 8: Optimization (conclude)
Tuesday (3/1): Strong convexity, regularization [slides (not annotated)] [slides (annotated)][scribe notes][scribe notes (2)]
Thursday (3/3): Regularization, conclude optimization [slides (not annotated)] [slides (annotated)][scribe notes]
Week 9
Tuesday (3/15): Introduction to Neural Networks [slides (not annotated)] [slides (annotated)] [scribe notes][scribe notes (2)]
Thursday (3/17): Neural networks and expressibility [slides (not annotated)] [slides (annotated)][scribe notes]
Week 10
Tuesday (3/22): Neural networks: Representation, Optimization [slides (not annotated)] [slides (annotated)][scribe notes]
Week 11
Tuesday (3/29): Optimization in neural nets, feature learning [slides (not annotated)] [slides (annotated)][scribe notes] [scribe notes (2)]
Thusday (3/31): Gradient descent dynamics, neural tangent kernel [slides (not annotated)] [slides (annotated)] [scribe notes]
Week 12
Tuesday (4/5): NTK recap, Representation learning [slides (not annotated)] [slides (annotated)][scribe notes]
Thursday (4/7): Representation learning [slides (not annotated)] [slides (annotated)][scribe notes][scribe notes (2)]
Week 13
Tuesday (4/12): Robustness in learning [slides (not annotated)] [slides (annotated)][scribe notes]
Thursday (4/14): Robustness and adversarial examples [slides (not annotated)] [slides (annotated)][scribe notes]
Course Summary:
Date | Details | Due |
---|---|---|