CS 6960-001 Spring 2025 Theory of Machine Learning

Learning algorithms are ubiquitous, and play an ever-increasing role in our lives. How are they different from "usual" algorithms? How can we reason about the ability of an algorithm to "learn from examples", and work with data it has never seen? We will introduce the formal notions of learnability, generalization, supervised vs. unsupervised learning, optimization, and so on. The goal is also to introduce the fundamental techniques and models that underlie some of the more recent developments in ML.

Background and pre-requisites

In order to appreciate the content, some knowledge of basic machine learning methods is recommended. Background in basic algorithm design (CS 4150 or 5150), probability and linear algebra will also be assumed. We will routinely work with multivariate functions, their derivatives, etc., so a working knowledge of these ideas will be assumed.

Logistics

Class time/days: 11:50 - 1:20 on Mondays and Wednesdays
Location: WBB 617 (Geology building, above Two-Creek Coffee)

Instructor Office Hours: Wednesday, 2-3pm. I can also hold extra office hours by appointment, please send email to: a.bhaskara@utah.edu

TA: Prasanth Yalamanchili
TA Office Hours: (Friday, 2-4 PM at MEB 3145)

Topics

Here is a rough list of topics that we will cover. For more details, see the Schedule below.

Statistical learning theory: the PAC model
VC dimension, generalization bounds, Rademacher complexity
Convex optimization: gradient descent and its variants
Online learning and bandits
Neural networks: power and limitations
Generalization in neural networks
Learning embeddings, unsupervised and self-supervised learning
State space models and transformers

Grading and Logistics

There will be about four homeworks, together contributing 60% to your grade. HWs will primarily be theory, but there will occasionally involve implementation. There will also be a take-home midterm exam (that you hand in two days after it is released), and a course project. These will each contribute 20% to your final grade. The course project (25%) involves a 15-30 minute presentation on a topic of your choice from a list of candidate topics that will be provided half way into the semester.

Resources

For the first half of this course, we will use the textbook Understanding Machine Learning: From Theory to Algorithms, by Shai Shalev-Shwartz and Shai Ben-David. A copy of the book is available online Links to an external site. from the author's page (for personal use). Another useful reference is the more recent book by Francis Bach, again available freely Links to an external site..

Semester guidelines, other policies. The Price College of Engineering provides a set of guidelines each semester. Please go over this document Download go over this document for policies on withdrawal from courses, information on safety, etc.

Academic misconduct. This is an advanced topics course, so you are encouraged to discuss with your friends and classmates. However, you should write down solutions on your own. Any signs of re-using solutions or code will be viewed as cheating and will be subject to the standard departmental procedures Links to an external site.. On a similar vein, you are welcome to use LLMs to clean up solutions or to get some hints on how to approach a HW problem, but note that you are taking responsibility for the produced content (LLMs are still pretty bad at reasoning).

Course Schedule

Here we will post the lecture schedule, notes and references whenever appropriate.

Week 1:

Course overview, brief history of ML, and introduction to Valiant's PAC model

Lecture 1: Logistics, course outline, introduction to generalization. [Video]

Lecture 2: Valiant's PAC model, loss minimization and generalization for finite hypothesis classes. [Video Links to an external site.]

Week 2:

Lecture 3: Generalization bounds for infinite hypothesis classes, VC dimension. [Video Links to an external site.]

Lecture 4: VC Dimension ,Growth functions and Examples. [Video Links to an external site.]

Week 3:

Lecture 5: PAC model overview, Rademacher Complexity (distribution dependent generalization bounds).[Video Links to an external site.][Slides Download Slides]

Week 4:

Lecture 6: Loss minimization for parametrized hypothesis class -- optimization, Loss functions. Linear classifiers and perceptron.[Video Links to an external site.]

Lecture 7: Convex loss functions, convexity, gradient descent.[Video Links to an external site.]

Week 5:

Lecture 8: Analysis of gradient descent, projective gradient descent.[Video Links to an external site.]

Lecture 9: Stochastic gradient descent, Online convex optimization.[Video Links to an external site.]
Reading: Chapters 3.1 and 3.4 of Elad Hazan's notes Links to an external site..

Week 6:

Lecture 10: Analysis of stochastic gradient descent, gradient descent with Smoothness.[Video Links to an external site.]

Lecture 11: Gradient descent with strong convexity.[Video Links to an external site.]

Week 7:

Lecture 12: Second order methods:- Newtons method, Momentum, Adagrad.[Video Links to an external site.]

Week 8:

Lecture 13: Regularization, Non-convex optimization from chapter-6 of the book:- Theory of Deep Learning Links to an external site. [Video Links to an external site.]

Lecture 14: Non-convex optimization, Introduction to Neural networks and universality theorem [Video Links to an external site.]

Week 9:

Lecture 15: Expressibility (contd.) Universal approximation theorems (Cybenko, Barron's theorem), Trade-offs between depth and width. Hardness of training Neural Nets (Rivest and Blum)
Resources: Telgarsky's lecture notes Links to an external site., For Barron's theorem, these notes Links to an external site. are a bit simpler. Original paper of Blum and Rivest Links to an external site. on hardness of training. For width-vs-depth trade-offs, see this paper Links to an external site.. [Video Links to an external site.]

Lecture 16: Training NNs in practice: Backpropagation algorithm [Video Links to an external site.]

Week 10: Spring Break

Week 11:

(Monday lecture canceled due to instructor being sick)

Lecture 17: Practical aspects of training Neural nets: drop-out for improving generalization. CNNs for parameter reduction for vision/audio applications. [Video Links to an external site.]

[Mid-term exam]

Week 12:

Lecture 18: Conclude discussion on Neural net basics, Network Compression for better generalization.[Video Links to an external site.]

Lecture 19: Kernel regression, Neural Tangent Kernel. [Video Links to an external site.]

Week 13:

Lecture 20: Generative models, Guassian fitting, Maximum likelyhood estimation, Pearson moments meathod [Video Links to an external site.]

Lecture 21: Guassian mixture models, ElBO function optimization using E-M method, Moments method [Video Links to an external site.]

Week 14:

Lecture 22: Deep generative modelling [Video Links to an external site.].

References for the lecture: A Study on the Evaluation of Generative Models Links to an external site.and Do GAN's learn the distribution ? Links to an external site.and for more longer reading look into Calibrated Language Models Must Hallucinate. Links to an external site.