Course Syllabus

 CS 6958 - Advanced Computer Vision (Spring 2026)

 

Time: Monday and Wednesday, 04:35 PM - 05:55 PM

Location  WEB L126

Instructor: Ziad Al-Halah 

                    Office Hours: Thursday 4:00-5:00 pm in MEB 2176, or by appointment

                   ziad.al-halah@utah.edu

TA: Brian Cho

                    Office Hours:  Tuesdays 2:00-4:00 pm and Fridays 9:30-11:30 am, in MEB 3515

                  brian.cho@utah.edu

 

Course Description

This course explores recent advances in computer vision, emphasizing state-of-the-art methods, their strengths and limitations, and emerging research opportunities. Students will also complete a self-guided research project that develops or extends ideas from the course.

Topics covered in this course include (for details, see the schedule at the end of the page):

  • Object Detection and Segmentation
  • Video Understanding
  • Self-Supervised Learning
  • Zero-shot and Few-shot Learning
  • Transfer Learning
  • Vision and Language
  • Vision and Audio
  • Vision for Robotics
  • Egocentric Perception

 

Prerequisites

This is an advanced graduate-level course, not an introduction to computer vision. Prior coursework in machine learning and deep learning (e.g., CS 5350/6350 or CS 5353/6353) is recommended. Strong programming skills, preferably in Python, are required. If you are unsure whether the course is appropriate for your background, please consult the instructor.

 

Learning Objectives

By the end of the course, students will be able to:

1. Understand state-of-the-art computer vision methods for tasks such as object detection, video understanding, and visual representation learning.
2. Explain key principles behind learning paradigms including self-supervised, zero-shot, few-shot, and transfer learning, and describe how they are applied in vision.
3. Evaluate and compare models in terms of accuracy, generalization, interpretability, and computational cost, and select appropriate models for specific tasks.
4. Design and run experiments using standard datasets and metrics, and present a research project that leverages advanced computer vision techniques.
5. Write clear technical reports and deliver concise presentations that effectively communicate research findings and their relation to prior work.

 

Course Structure

The course has two phases. The first consists of instructor-led lectures that introduce core topics and foundational methods in advanced computer vision, including detailed analyses of influential approaches and papers. In the second phase, Paper Reading & Discussion, students read selected papers, present insights, and lead discussions.

 

Assessments & Grading

Students will work individually or in group on a capstone project that investigates a research question, proposes a novel idea, or replicates and extends a recent paper. Students will also read and discuss assigned papers, prepare short analyses, and contribute to weekly discussions. Additional assessments include a few programming assignments and a midterm exam covering material from the first half of the course.

  • 40% Final Project (proposal, milestones, report, presentation)
  • 30% Paper Analysis & Participation (presentation, discussion, peer feedback)
  • 15% Midterm Exam
  • 15% Assignments

 

Course Materials

 

Late Policy

Each student has 72 late hours (3 days) to use throughout the semester without penalty. These hours apply automatically.

After the grace hours are exhausted, late submissions incur a 1% penalty per hour, up to a maximum deduction of 72% (three days late). Assignments cannot be submitted more than three days past the deadline, after that point, the submission portal closes and the assignment receives a zero, even if grace hours remain.

If you anticipate needing an extension due to extenuating circumstances, notify the teaching staff as early as possible.

 

Regrade Requests

Regrade requests must be submitted within one week of the grade release for an assignment. Requests submitted after this window will not be considered.

 

Use of AI Tools

You may use any tools to write code, debug, run experiments, understand papers, and brainstorm ideas. However, all written work—such as reports and reviews—must be completed by you without using generative AI tools. You may use tools like ChatGPT while working on assignments and the final project, but the final written results, findings, and reflections must be entirely your own. Writing independently helps you think more deeply about the material and supports your learning.


Academic Misconduct

It is expected that students comply with University of Utah policies regarding academic honesty, including but not limited to refraining from cheating, plagiarizing, misrepresenting one’s work, and/or inappropriately collaborating. This includes the use of generative artificial intelligence (AI) tools without citation, documentation, or authorization. Students are expected to adhere to the prescribed professional and ethical standards of the profession/discipline for which they are preparing. Any student who engages in academic dishonesty or who violates the professional and ethical standards for their profession/discipline may be subject to academic sanctions as per the University of Utah’s Student Code: Policy 6-410: Student Academic Performance, Academic Conduct, and Professional and Ethical Conduct.

Plagiarism and cheating are serious offenses and may be punished by failure on an individual assignment, and/or failure in the course. Academic misconduct, according to the University of Utah Student Code:

“...Includes, but is not limited to, cheating, misrepresenting one’s work, inappropriately collaborating, plagiarism, and fabrication or falsification of information…It also includes facilitating academic misconduct by intentionally helping or attempting to help another to commit an act of academic misconduct.”

For details on plagiarism and other important course conduct issues, see the U's Code of Student Rights and Responsibilities.

 

Further Course Administrative Information

 

 

Schedule

The following schedule is tentative and subject to change at the discretion of the instructor throughout the semester. Students should refer to the due dates listed on each assignment in Canvas or as announced by the instructor for the most accurate and up-to-date deadlines.

 

Date

Title

Resources Notes

1

01/07

Introduction

01_introduction.pdf

2

01/12

CNNs 

02_cnns.pdf

Papers: AlexNet, MobileNet, ResNet

3

01/14

Transformers

03_transformers.pdf

Papers: Transformer, ViT

4

01/19

No Class – MLK holiday

5

01/21

Object Detection

04_obj_detection.pdf

Papers: Faster R-CNN, YOLO, DETR

HW1 is out

6

01/26

Segmentation

05_segmentation.pdf

Papers: FCN, DeconvNet, UNet, PSPNet, DeepLapV3, Mask R-CNN, DETR

7

01/28

Representation Learning

06_representation.pdf

Papers: Colorization, SimCLR, MoCoV3, MAE

8

02/02

Image Generation and Manipulation

07_image_gen.pdf

Topic Selection is due

9

02/04

Visual Explanation / Interpretation

HW1 is due

10

02/09

No Class – Workshop

HW2 is out

11

02/11

Video

12

02/16

No Class – Presidents' Day holiday

13

02/18

Vision and Language

HW2 is due

14

02/23 

Vision and Audio

Project proposal is due

15

02/25

Buffer Lecture

16

03/02

Ethical Computer Vision

Review 01 is due

17

03/04

Midterm

18

03/09

Spring Break

19

03/11

Spring Break

20

03/16

Depth and Geometry Estimation

* DUSt3R: Geometric 3D Vision Made Easy 

VGGT: Visual Geometry Grounded Transformer

 

21

03/18

3D Understanding

* 3D Scene Understanding with Open Vocabularies 

Masked Autoencoders for Point Cloud Self-supervised Learning

Review 02 is due

22

03/23

Large Vision Language Models

* Visual Instruction Tuning 

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

 

23

03/25

Action and Manipulation

* OpenVLA: An Open-Source Vision-Language-Action Model

VIMA: General Robot Manipulation with Multimodal Prompts

Review 03 is due

24

03/30

Touch and Vision

* Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

Project Progress Report is due

25

04/01

Audio-Visual Learning

* From Vision to Audio and Beyond: A Unified Model for Audio-Visual
Representation and Generation

SAM Audio: Segment Anything in Audio

Review 04 is due

26

04/06

Long-form Video Understanding

* VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Efficient Video Transformers with Spatial-Temporal Token Selection 

 

27

04/08

Procedural Videos

* SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Review 05 is due

28

04/13

Vision for Science

* A solution to generalized learning from small training sets found in
infants’ repeated visual experiences of individual objects 

Toddler-Inspired Visual Object Learning

 

29

04/15

World Models

* PlayerOne: Egocentric World Simulator

Learning Interactive Real-World Simulators

 

30

04/20

Project Presentation