Course Syllabus
CS 6958 - Advanced Computer Vision (Spring 2026)
Time: Monday and Wednesday, 04:35 PM - 05:55 PM
Location: WEB L126
Instructor: Ziad Al-Halah
Office Hours: Thursday 4:00-5:00 pm in MEB 2176, or by appointment
ziad.al-halah@utah.edu
TA: Brian Cho
Office Hours: Tuesdays 2:00-4:00 pm and Fridays 9:30-11:30 am, in MEB 3515
brian.cho@utah.edu
Course Description
This course explores recent advances in computer vision, emphasizing state-of-the-art methods, their strengths and limitations, and emerging research opportunities. Students will also complete a self-guided research project that develops or extends ideas from the course.
Topics covered in this course include (for details, see the schedule at the end of the page):
- Object Detection and Segmentation
- Video Understanding
- Self-Supervised Learning
- Zero-shot and Few-shot Learning
- Transfer Learning
- Vision and Language
- Vision and Audio
- Vision for Robotics
- Egocentric Perception
Prerequisites
This is an advanced graduate-level course, not an introduction to computer vision. Prior coursework in machine learning and deep learning (e.g., CS 5350/6350 or CS 5353/6353) is recommended. Strong programming skills, preferably in Python, are required. If you are unsure whether the course is appropriate for your background, please consult the instructor.
Learning Objectives
By the end of the course, students will be able to:
1. Understand state-of-the-art computer vision methods for tasks such as object detection, video understanding, and visual representation learning.
2. Explain key principles behind learning paradigms including self-supervised, zero-shot, few-shot, and transfer learning, and describe how they are applied in vision.
3. Evaluate and compare models in terms of accuracy, generalization, interpretability, and computational cost, and select appropriate models for specific tasks.
4. Design and run experiments using standard datasets and metrics, and present a research project that leverages advanced computer vision techniques.
5. Write clear technical reports and deliver concise presentations that effectively communicate research findings and their relation to prior work.
Course Structure
The course has two phases. The first consists of instructor-led lectures that introduce core topics and foundational methods in advanced computer vision, including detailed analyses of influential approaches and papers. In the second phase, Paper Reading & Discussion, students read selected papers, present insights, and lead discussions.
Assessments & Grading
Students will work individually or in group on a capstone project that investigates a research question, proposes a novel idea, or replicates and extends a recent paper. Students will also read and discuss assigned papers, prepare short analyses, and contribute to weekly discussions. Additional assessments include a few programming assignments and a midterm exam covering material from the first half of the course.
- 40% Final Project (proposal, milestones, report, presentation)
- 30% Paper Analysis & Participation (presentation, discussion, peer feedback)
- 15% Midterm Exam
- 15% Assignments
Course Materials
- Lecture slides (posted weekly on Canvas)
- Research papers (linked in weekly modules)
- Reference texts (all available online):
- Goodfellow, Bengio, Courville. Deep Learning.
- Szeliski. Computer Vision: Algorithms and Applications (2nd ed.)
- Torralba, Isola, Freeman. Foundations of Computer Vision.
Late Policy
Each student has 72 late hours (3 days) to use throughout the semester without penalty. These hours apply automatically.
After the grace hours are exhausted, late submissions incur a 1% penalty per hour, up to a maximum deduction of 72% (three days late). Assignments cannot be submitted more than three days past the deadline, after that point, the submission portal closes and the assignment receives a zero, even if grace hours remain.
If you anticipate needing an extension due to extenuating circumstances, notify the teaching staff as early as possible.
Regrade Requests
Regrade requests must be submitted within one week of the grade release for an assignment. Requests submitted after this window will not be considered.
Use of AI Tools
You may use any tools to write code, debug, run experiments, understand papers, and brainstorm ideas. However, all written work—such as reports and reviews—must be completed by you without using generative AI tools. You may use tools like ChatGPT while working on assignments and the final project, but the final written results, findings, and reflections must be entirely your own. Writing independently helps you think more deeply about the material and supports your learning.
Academic Misconduct
It is expected that students comply with University of Utah policies regarding academic honesty, including but not limited to refraining from cheating, plagiarizing, misrepresenting one’s work, and/or inappropriately collaborating. This includes the use of generative artificial intelligence (AI) tools without citation, documentation, or authorization. Students are expected to adhere to the prescribed professional and ethical standards of the profession/discipline for which they are preparing. Any student who engages in academic dishonesty or who violates the professional and ethical standards for their profession/discipline may be subject to academic sanctions as per the University of Utah’s Student Code: Policy 6-410: Student Academic Performance, Academic Conduct, and Professional and Ethical Conduct.
Plagiarism and cheating are serious offenses and may be punished by failure on an individual assignment, and/or failure in the course. Academic misconduct, according to the University of Utah Student Code:
“...Includes, but is not limited to, cheating, misrepresenting one’s work, inappropriately collaborating, plagiarism, and fabrication or falsification of information…It also includes facilitating academic misconduct by intentionally helping or attempting to help another to commit an act of academic misconduct.”
For details on plagiarism and other important course conduct issues, see the U's Code of Student Rights and Responsibilities.
Further Course Administrative Information
- Student Disability Accommodations: https://oeo.utah.edu/how-can-we-help/disability-access.php
- School of Computing Academic Misconduct Policy: https://www.cs.utah.edu/undergraduate/current-students/policy-statement-on-academic-misconduct/
- University of Utah Academic Misconduct Policy: https://regulations.utah.edu/academics/6-410.php#a.III.C
- U Student AI Guide: https://cte.utah.edu/instructor-education/ai-website-conten
t/gen_ai_2024.pdf
Schedule
The following schedule is tentative and subject to change at the discretion of the instructor throughout the semester. Students should refer to the due dates listed on each assignment in Canvas or as announced by the instructor for the most accurate and up-to-date deadlines.
|
# |
Date |
Title |
Resources | Notes |
|
1 |
01/07 |
Introduction |
01_introduction.pdf | |
|
2 |
01/12 |
CNNs |
||
|
3 |
01/14 |
Transformers |
Papers: Transformer, ViT |
|
|
4 |
01/19 |
No Class – MLK holiday |
||
|
5 |
01/21 |
Object Detection |
Papers: Faster R-CNN, YOLO, DETR |
HW1 is out |
|
6 |
01/26 |
Segmentation |
Papers: FCN, DeconvNet, UNet, PSPNet, DeepLapV3, Mask R-CNN, DETR |
|
|
7 |
01/28 |
Representation Learning |
Papers: Colorization, SimCLR, MoCoV3, MAE |
|
|
8 |
02/02 |
Image Generation and Manipulation |
07_image_gen.pdf |
Topic Selection is due |
|
9 |
02/04 |
Visual Explanation / Interpretation |
HW1 is due |
|
|
10 |
02/09 |
No Class – Workshop |
HW2 is out | |
|
11 |
02/11 |
Video |
||
|
12 |
02/16 |
No Class – Presidents' Day holiday |
||
|
13 |
02/18 |
Vision and Language |
HW2 is due | |
|
14 |
02/23 |
Vision and Audio |
Project proposal is due | |
|
15 |
02/25 |
Buffer Lecture |
||
|
16 |
03/02 |
Ethical Computer Vision |
Review 01 is due | |
|
17 |
03/04 |
Midterm |
||
|
18 |
03/09 |
Spring Break |
||
|
19 |
03/11 |
Spring Break |
||
|
20 |
03/16 |
Depth and Geometry Estimation |
|
|
|
21 |
03/18 |
3D Understanding |
* 3D Scene Understanding with Open Vocabularies Masked Autoencoders for Point Cloud Self-supervised Learning |
Review 02 is due |
|
22 |
03/23 |
Large Vision Language Models |
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency |
|
|
23 |
03/25 |
Action and Manipulation |
Review 03 is due |
|
|
24 |
03/30 |
Touch and Vision |
* Binding Touch to Everything: Learning Unified Multimodal Tactile Representations Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play |
Project Progress Report is due |
|
25 |
04/01 |
Audio-Visual Learning |
* From Vision to Audio and Beyond: A Unified Model for Audio-Visual |
Review 04 is due |
|
26 |
04/06 |
Long-form Video Understanding |
* VideoAgent: Long-form Video Understanding with Large Language Model as Agent Efficient Video Transformers with Spatial-Temporal Token Selection |
|
|
27 |
04/08 |
Procedural Videos |
* SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World |
Review 05 is due |
|
28 |
04/13 |
Vision for Science |
* A solution to generalized learning from small training sets found in |
|
|
29 |
04/15 |
World Models |
|
|
|
30 |
04/20 |
Project Presentation |