Course Syllabus

CS 6958 - Advanced Computer Vision (Spring 2026)

Time: Monday and Wednesday, 04:35 PM - 05:55 PM

Location: WEB L126

Instructor: Ziad Al-Halah

Office Hours: Thursday 4:00-5:00 pm in MEB 2176, or by appointment

ziad.al-halah@utah.edu

TA: Brian Cho

Office Hours: Tuesdays 2:00-4:00 pm and Fridays 9:30-11:30 am, in MEB 3515

brian.cho@utah.edu

Course Description

This course explores recent advances in computer vision, emphasizing state-of-the-art methods, their strengths and limitations, and emerging research opportunities. Students will also complete a self-guided research project that develops or extends ideas from the course.

Topics covered in this course include (for details, see the schedule at the end of the page):

Object Detection and Segmentation
Video Understanding
Self-Supervised Learning
Zero-shot and Few-shot Learning
Transfer Learning
Vision and Language
Vision and Audio
Vision for Robotics
Egocentric Perception

Prerequisites

This is an advanced graduate-level course, not an introduction to computer vision. Prior coursework in machine learning and deep learning (e.g., CS 5350/6350 or CS 5353/6353) is recommended. Strong programming skills, preferably in Python, are required. If you are unsure whether the course is appropriate for your background, please consult the instructor.

Learning Objectives

By the end of the course, students will be able to:

1. Understand state-of-the-art computer vision methods for tasks such as object detection, video understanding, and visual representation learning.
2. Explain key principles behind learning paradigms including self-supervised, zero-shot, few-shot, and transfer learning, and describe how they are applied in vision.
3. Evaluate and compare models in terms of accuracy, generalization, interpretability, and computational cost, and select appropriate models for specific tasks.
4. Design and run experiments using standard datasets and metrics, and present a research project that leverages advanced computer vision techniques.
5. Write clear technical reports and deliver concise presentations that effectively communicate research findings and their relation to prior work.

Course Structure

The course has two phases. The first consists of instructor-led lectures that introduce core topics and foundational methods in advanced computer vision, including detailed analyses of influential approaches and papers. In the second phase, Paper Reading & Discussion, students read selected papers, present insights, and lead discussions.

Assessments & Grading

Students will work individually or in group on a capstone project that investigates a research question, proposes a novel idea, or replicates and extends a recent paper. Students will also read and discuss assigned papers, prepare short analyses, and contribute to weekly discussions. Additional assessments include a few programming assignments and a midterm exam covering material from the first half of the course.

40% Final Project (proposal, milestones, report, presentation)
30% Paper Analysis & Participation (presentation, discussion, peer feedback)
15% Midterm Exam
15% Assignments

Course Materials

Lecture slides (posted weekly on Canvas)
Research papers (linked in weekly modules)
Reference texts (all available online):
- Goodfellow, Bengio, Courville. Deep Learning Links to an external site..
- Szeliski. Computer Vision: Algorithms and Applications (2nd ed.) Links to an external site.
- Torralba, Isola, Freeman. Foundations of Computer Vision Links to an external site..

Late Policy

Each student has 72 late hours (3 days) to use throughout the semester without penalty. These hours apply automatically.

After the grace hours are exhausted, late submissions incur a 1% penalty per hour, up to a maximum deduction of 72% (three days late). Assignments cannot be submitted more than three days past the deadline, after that point, the submission portal closes and the assignment receives a zero, even if grace hours remain.

If you anticipate needing an extension due to extenuating circumstances, notify the teaching staff as early as possible.

Regrade Requests

Regrade requests must be submitted within one week of the grade release for an assignment. Requests submitted after this window will not be considered.

Use of AI Tools

You may use any tools to write code, debug, run experiments, understand papers, and brainstorm ideas. However, all written work—such as reports and reviews—must be completed by you without using generative AI tools. You may use tools like ChatGPT while working on assignments and the final project, but the final written results, findings, and reflections must be entirely your own. Writing independently helps you think more deeply about the material and supports your learning.

Academic Misconduct

It is expected that students comply with University of Utah policies regarding academic honesty, including but not limited to refraining from cheating, plagiarizing, misrepresenting one’s work, and/or inappropriately collaborating. This includes the use of generative artificial intelligence (AI) tools without citation, documentation, or authorization. Students are expected to adhere to the prescribed professional and ethical standards of the profession/discipline for which they are preparing. Any student who engages in academic dishonesty or who violates the professional and ethical standards for their profession/discipline may be subject to academic sanctions as per the University of Utah’s Student Code: Policy 6-410: Student Academic Performance, Academic Conduct, and Professional and Ethical Conduct.

Plagiarism and cheating are serious offenses and may be punished by failure on an individual assignment, and/or failure in the course. Academic misconduct, according to the University of Utah Student Code:

“...Includes, but is not limited to, cheating, misrepresenting one’s work, inappropriately collaborating, plagiarism, and fabrication or falsification of information…It also includes facilitating academic misconduct by intentionally helping or attempting to help another to commit an act of academic misconduct.”

For details on plagiarism and other important course conduct issues, see the U's Code of Student Rights and Responsibilities.

Further Course Administrative Information

Student Disability Accommodations: https://oeo.utah.edu/how-can-we-help/disability-access.php Links to an external site.
School of Computing Academic Misconduct Policy: https://www.cs.utah.edu/undergraduate/current-students/policy-statement-on-academic-misconduct/ Links to an external site.
University of Utah Academic Misconduct Policy: https://regulations.utah.edu/academics/6-410.php#a.III.C Links to an external site.
U Student AI Guide: https://cte.utah.edu/instructor-education/ai-website-conten Links to an external site.
t/gen_ai_2024.pdf Links to an external site.

Schedule

The following schedule is tentative and subject to change at the discretion of the instructor throughout the semester. Students should refer to the due dates listed on each assignment in Canvas or as announced by the instructor for the most accurate and up-to-date deadlines.

#	Date	Title	Resources	Notes
1	01/07	Introduction	01_introduction.pdf Actions Preview Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats
2	01/12	CNNs	02_cnns.pdf Actions Preview Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Reading: AlexNet Links to an external site., MobileNet Links to an external site., ResNet Links to an external site.
3	01/14	Transformers	03_transformers.pdf Actions Preview Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Reading: Transformer Links to an external site., ViT Links to an external site.
4	01/19	MLK holiday
5	01/21	Object Detection	04_obj_detection.pdf Actions Preview Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Reading: Faster R-CNN Links to an external site., YOLO Links to an external site., DETR Links to an external site.	HW1 is out
6	01/26	Segmentation	05_segmentation.pdf Actions Preview Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Reading: FCN Links to an external site., DeconvNet Links to an external site., UNet Links to an external site., PSPNet Links to an external site., DeepLapV3 Links to an external site., Mask R-CNN Links to an external site., DETR Links to an external site.
7	01/28	Representation Learning	06_representation.pdf Actions Preview Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Reading: Colorization Links to an external site., SimCLR Links to an external site., MoCoV3 Links to an external site., MAE Links to an external site.
8	02/02	Image Generation and Manipulation	07_image_gen.pdf Actions Preview Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Reading: Tutorial on VAE Links to an external site., GAN Links to an external site., DC-GAN Links to an external site., Progressive GAN Links to an external site., Pix2Pix Links to an external site.	Topic Selection is due
9	02/04	Visual Explanation / Interpretation No Class		HW1 is due
10	02/09	No Class – Workshop
11	02/11	Video	09_video.pdf Actions Preview Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Reading: Fusion & Multires CNN Links to an external site., C3D Links to an external site., Two-Stream CNN Links to an external site., I3D Links to an external site., ViViT Links to an external site., MViTv2 Links to an external site.	HW2 is out
12	02/16	Presidents' Day holiday
13	02/18	Vision and Language	10_vision_and_language.pdf Actions Preview Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Actions Preview Actions Preview Download Alternative formats Download Alternative formats Reading: CLIP Links to an external site., Flamingo Links to an external site., DeViSE Links to an external site.,
14	02/23	Vision and Audio	11_vision_and_audio.pdf Reading: WaveNet, AST, Multisensory, AV Spatial Features, VisualEchoes,	Project proposal is due
15	02/25	Guest Lecture MEB 3485, 2:00 pm	Responsible AI Evaluation ‘’What’’s, and ‘’How’’s of Evaluation Dr. Negar Rostamzadeh (Google Research) Video	HW2 is due
16	03/02	Guest Lecture MEB 3147, 2:00 pm	Beyond Flat Embeddings: Evolving Multimodal Representations for Reasoning and Retrieval Dr. Loris Bazzani (University of Verona)	Review 01 is due
17	03/04	Midterm
18	03/09	Spring Break
19	03/11	Spring Break
20	03/16	Depth and Geometry Estimation	* DUSt3R: Geometric 3D Vision Made Easy Links to an external site. VGGT: Visual Geometry Grounded Transformer Links to an external site.
21	03/18	3D Understanding	* 3D Scene Understanding with Open Vocabularies Links to an external site. Masked Autoencoders for Point Cloud Self-supervised Learning Links to an external site.	Review 02 is due
22	03/23	Large Vision Language Models	* Visual Instruction Tuning Links to an external site. InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Links to an external site.
23	03/25	Action and Manipulation	* OpenVLA: An Open-Source Vision-Language-Action Model Links to an external site. VIMA: General Robot Manipulation with Multimodal Prompts Links to an external site.	Review 03 is due
24	03/30	Touch and Vision	* Binding Touch to Everything: Learning Unified Multimodal Tactile Representations Links to an external site. Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play Links to an external site.	Project Progress Report is due
25	04/01	Audio-Visual Learning	* From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation Links to an external site. SAM Audio: Segment Anything in Audio Links to an external site.	Review 04 is due
26	04/06	Long-form Video Understanding	* VideoAgent: Long-form Video Understanding with Large Language Model as Agent Links to an external site. Efficient Video Transformers with Spatial-Temporal Token Selection Links to an external site.
27	04/08	Procedural Videos	* SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos Links to an external site. HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World Links to an external site.	Review 05 is due
28	04/13	Vision for Science	* A solution to generalized learning from small training sets found in Links to an external site. infants’ repeated visual experiences of individual objects Links to an external site. Toddler-Inspired Visual Object Learning Links to an external site.
29	04/15	World Models	* PlayerOne: Egocentric World Simulator Links to an external site. Learning Interactive Real-World Simulators Links to an external site.
30	04/20	Project Presentation