Course Syllabus

 CS 6958 - Advanced Computer Vision (Spring 2026)

 

Time: Monday and Wednesday, 04:35 PM - 05:55 PM

Location:   WEB L126

Instructor: Ziad Al-Halah 

                    Office Hours: Thursday 4:00-5:00 pm in MEB 2176, or by appointment

                   ziad.al-halah@utah.edu

TA: Brian Cho

                    Office Hours:  Tuesdays 2:00-4:00 pm and Fridays 9:30-11:30 am, in MEB 3515

                  brian.cho@utah.edu

 

Course Description

This course explores recent advances in computer vision, emphasizing state-of-the-art methods, their strengths and limitations, and emerging research opportunities. Students will also complete a self-guided research project that develops or extends ideas from the course.

Topics covered in this course include (for details, see the schedule at the end of the page):

  • Object Detection and Segmentation
  • Video Understanding
  • Self-Supervised Learning
  • Zero-shot and Few-shot Learning
  • Transfer Learning
  • Vision and Language
  • Vision and Audio
  • Vision for Robotics
  • Egocentric Perception

 

Prerequisites

This is an advanced graduate-level course, not an introduction to computer vision. Prior coursework in machine learning and deep learning (e.g., CS 5350/6350 or CS 5353/6353) is recommended. Strong programming skills, preferably in Python, are required. If you are unsure whether the course is appropriate for your background, please consult the instructor.

 

Learning Objectives

By the end of the course, students will be able to:

1. Understand state-of-the-art computer vision methods for tasks such as object detection, video understanding, and visual representation learning.
2. Explain key principles behind learning paradigms including self-supervised, zero-shot, few-shot, and transfer learning, and describe how they are applied in vision.
3. Evaluate and compare models in terms of accuracy, generalization, interpretability, and computational cost, and select appropriate models for specific tasks.
4. Design and run experiments using standard datasets and metrics, and present a research project that leverages advanced computer vision techniques.
5. Write clear technical reports and deliver concise presentations that effectively communicate research findings and their relation to prior work.

 

Course Structure

The course has two phases. The first consists of instructor-led lectures that introduce core topics and foundational methods in advanced computer vision, including detailed analyses of influential approaches and papers. In the second phase, Paper Reading & Discussion, students read selected papers, present insights, and lead discussions.

 

Assessments & Grading

Students will work individually or in group on a capstone project that investigates a research question, proposes a novel idea, or replicates and extends a recent paper. Students will also read and discuss assigned papers, prepare short analyses, and contribute to weekly discussions. Additional assessments include a few programming assignments and a midterm exam covering material from the first half of the course.

  • 40% Final Project (proposal, milestones, report, presentation)
  • 30% Paper Analysis & Participation (presentation, discussion, peer feedback)
  • 15% Midterm Exam
  • 15% Assignments

 

Course Materials

 

Late Policy

Each student has 72 late hours (3 days) to use throughout the semester without penalty. These hours apply automatically.

After the grace hours are exhausted, late submissions incur a 1% penalty per hour, up to a maximum deduction of 72% (three days late). Assignments cannot be submitted more than three days past the deadline, after that point, the submission portal closes and the assignment receives a zero, even if grace hours remain.

If you anticipate needing an extension due to extenuating circumstances, notify the teaching staff as early as possible.

 

Regrade Requests

Regrade requests must be submitted within one week of the grade release for an assignment. Requests submitted after this window will not be considered.

 

Use of AI Tools

You may use any tools to write code, debug, run experiments, understand papers, and brainstorm ideas. However, all written work—such as reports and reviews—must be completed by you without using generative AI tools. You may use tools like ChatGPT while working on assignments and the final project, but the final written results, findings, and reflections must be entirely your own. Writing independently helps you think more deeply about the material and supports your learning.


Academic Misconduct

It is expected that students comply with University of Utah policies regarding academic honesty, including but not limited to refraining from cheating, plagiarizing, misrepresenting one’s work, and/or inappropriately collaborating. This includes the use of generative artificial intelligence (AI) tools without citation, documentation, or authorization. Students are expected to adhere to the prescribed professional and ethical standards of the profession/discipline for which they are preparing. Any student who engages in academic dishonesty or who violates the professional and ethical standards for their profession/discipline may be subject to academic sanctions as per the University of Utah’s Student Code: Policy 6-410: Student Academic Performance, Academic Conduct, and Professional and Ethical Conduct.

Plagiarism and cheating are serious offenses and may be punished by failure on an individual assignment, and/or failure in the course. Academic misconduct, according to the University of Utah Student Code:

“...Includes, but is not limited to, cheating, misrepresenting one’s work, inappropriately collaborating, plagiarism, and fabrication or falsification of information…It also includes facilitating academic misconduct by intentionally helping or attempting to help another to commit an act of academic misconduct.”

For details on plagiarism and other important course conduct issues, see the U's Code of Student Rights and Responsibilities.

 

Further Course Administrative Information

 

 

Schedule

The following schedule is tentative and subject to change at the discretion of the instructor throughout the semester. Students should refer to the due dates listed on each assignment in Canvas or as announced by the instructor for the most accurate and up-to-date deadlines.

 

Date

Title

Resources Notes

1

01/07

Introduction

01_introduction.pdf

2

01/12

CNNs 

02_cnns.pdf

Reading: AlexNet Links to an external site., MobileNet Links to an external site., ResNet Links to an external site.

3

01/14

Transformers

03_transformers.pdf

Reading: Transformer Links to an external site., ViT Links to an external site.

4

01/19

MLK holiday

5

01/21

Object Detection

04_obj_detection.pdf

Reading: Faster R-CNN Links to an external site., YOLO Links to an external site., DETR Links to an external site.

HW1 is out

6

01/26

Segmentation

05_segmentation.pdf

Reading: FCN Links to an external site., DeconvNet Links to an external site., UNet Links to an external site., PSPNet Links to an external site., DeepLapV3 Links to an external site., Mask R-CNN Links to an external site., DETR Links to an external site.

7

01/28

Representation Learning

06_representation.pdf

Reading: Colorization Links to an external site., SimCLR Links to an external site., MoCoV3 Links to an external site., MAE Links to an external site.

8

02/02

Image Generation and Manipulation

07_image_gen.pdf

Reading: Tutorial on VAE Links to an external site., GAN Links to an external site., DC-GAN Links to an external site., Progressive GAN Links to an external site., Pix2Pix Links to an external site.

Topic Selection is due

9

02/04

Visual Explanation / Interpretation

No Class

HW1 is due

10

02/09

No Class – Workshop

11

02/11

Video

09_video.pdf

Reading: Fusion & Multires CNN Links to an external site., C3D Links to an external site., Two-Stream CNN Links to an external site., I3D Links to an external site., ViViT Links to an external site., MViTv2 Links to an external site.

HW2 is out

12

02/16

Presidents' Day holiday

13

02/18

Vision and Language

10_vision_and_language.pdf

Reading: CLIP Links to an external site., Flamingo Links to an external site., DeViSE Links to an external site.

14

02/23 

Vision and Audio

Project proposal is due

15

02/25

Guest Lecture

MEB 3485, 2:00 pm 

Responsible AI Evaluation ‘’What’’s, and ‘’How’’s of Evaluation

Dr. Negar Rostamzadeh (Google Research)

HW2 is due

16

03/02

Guest Lecture

MEB 3147, 2:00 pm

Beyond Flat Embeddings: Evolving Multimodal Representations for Reasoning and Retrieval

Dr. Loris Bazzani (University of Verona)

Review 01 is due

17

03/04

Midterm

18

03/09

Spring Break

19

03/11

Spring Break

20

03/16

Depth and Geometry Estimation

* DUSt3R: Geometric 3D Vision Made Easy Links to an external site. 

VGGT: Visual Geometry Grounded Transformer Links to an external site.

 

21

03/18

3D Understanding

* 3D Scene Understanding with Open Vocabularies Links to an external site. 

Masked Autoencoders for Point Cloud Self-supervised Learning Links to an external site.

Review 02 is due

22

03/23

Large Vision Language Models

* Visual Instruction Tuning Links to an external site. 

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Links to an external site.

 

23

03/25

Action and Manipulation

* OpenVLA: An Open-Source Vision-Language-Action Model Links to an external site.

VIMA: General Robot Manipulation with Multimodal Prompts Links to an external site.

Review 03 is due

24

03/30

Touch and Vision

* Binding Touch to Everything: Learning Unified Multimodal Tactile Representations Links to an external site.

Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play Links to an external site.

Project Progress Report is due

25

04/01

Audio-Visual Learning

* From Vision to Audio and Beyond: A Unified Model for Audio-Visual
Representation and Generation
Links to an external site.

SAM Audio: Segment Anything in Audio Links to an external site.

Review 04 is due

26

04/06

Long-form Video Understanding

* VideoAgent: Long-form Video Understanding with Large Language Model as Agent Links to an external site.

Efficient Video Transformers with Spatial-Temporal Token Selection Links to an external site. 

 

27

04/08

Procedural Videos

* SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos Links to an external site.

HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World Links to an external site.

Review 05 is due

28

04/13

Vision for Science

* A solution to generalized learning from small training sets found in Links to an external site.
infants’ repeated visual experiences of individual objects Links to an external site. 

Toddler-Inspired Visual Object Learning Links to an external site.

 

29

04/15

World Models

* PlayerOne: Egocentric World Simulator Links to an external site.

Learning Interactive Real-World Simulators Links to an external site.

 

30

04/20

Project Presentation