Course Syllabus

Course Staff and Schedule

Instructor: Fengjiao Wang (3102 MEB) fengjiao@cs.utah.edu

Office Hours:

    Instructor's office hour: Tuesday 1:00pm-2:00pm

    TAs' office hours: TBD

Web Page: https://utah.instructure.com/courses/1223063

TAs: Sydney Lundberg, Parker Deyoung, Tanya Yu

Lectures: Tuesday & Thursday 10:45am-12:05pm, WEB L101

Course Description

This course provides an introduction to data wrangling and prepares students for further courses in the data science curriculum. Students will learn methods and skills involved in data collection, cleaning, and organizing.  In addition to the data wrangling methods, students will also learn to think critically about the ethical and social implications of using data.

Learning Objectives

By the end of this course, you will be able to

  • Obtain data from existing sources such as websites (web-scraping) and platform APIs
  • Set up and manage new data collection systems, with the focus on value-driven data collection (i.e., collecting the right data to answer a research question or achieve a business goal)
  • Clean up, aggregate, sample, reshape, and normalize data sets using tidy framework, statistical techniques, and existing large-scale data analytics tools
  • Perform exploratory data analysis on real-world data sets
  • Reflect on the applicability of the data wrangling methods in specific contexts and their ethical implications.

Course Design

The format of each class will vary between lectures, analysis labs, and in-class activities. Student performance will be evaluated through a combination of Reading Readiness, Module Assignments (MAs), Concept Practice Assignments, Class Participation, and a Final Project. There is no final exam.

The class is split up into six two-week modules. Each module corresponds to a fundamental data wrangling technique: (1) exploring an existing data set, (2) cleaning data into “tidier” formats, (3) retrieving new data from the web, (4) combining different data sets together, (5) inferring patterns in the data, (6) extrapolating and predicting with time series analysis.

Readings

There is no textbook required for class, but there will be required readings, tutorials, and other material, which will be made available through Canvas. Familiarity with the readings will be assessed through Perusall.

Here is the list of required course readings:

  • Leek - Elements of Data Analytic Style (Ch. 2).
  • Peng & Matsui - The Art of Data Science (Ch. 1-4, 10).
  • Data Ethics Canvas.
  • Pandas - Working with Missing Data.
  • Swalin (2018). "How to Handle Missing Data."
  • Wickham (2014) - Tidy Data.
  • Leek - Elements of Data Analytic Style (Ch. 3).
  • BeautifulSoup Cheat Sheet.
  • Densmore (2017). "Ethics in Web Scraping." Towards Data Science.
  • Chou (2016). "To scrape or not to scrape: technical and ethical challenges of collecting data off the web." Storybench. 
  • Kazil & Jarmul - Web Scraping and APIs.
  • Mitchell - Web Scraping and APIs.
  • Pandas documentation - Merge, join, and concatenate.
  • Harrison - When Is Anonymous Not Really Anonymous?
  • Pandas documentation - GroupBy: split-apply-combine.
  • Hill - The environment and disease - association or causation.
  • Nield - Ch. 1-8.
  • Nakagawa & Cuthill - Effect size, confidence interval, and statistical significance.
  • Sullivan & Feinn - Using Effect Size - why the P value is Not Enough.
  • Hedges - What are effect sizes.
  • Browne-Anderson, H. (2018) "Time Series Analysis Tutorial." DataCamp.
  • Walker, J. (2019). "Tutorial: Time Series Analysis with Pandas." Dataquest. 
  • Data Science Simplified Part 4: Simple Linear Regression Models.
  • Data Science Simplified Part 5: Multivariate Regression Models.

Grading:

Students will be evaluated through six different mechanisms.

  • Foundational Skills Assessment (1%): The first two weeks of the course focus on reviewing fundamental Python programming and basic statistical concepts essential for data wrangling. The Foundational Skills Assessment, administered in the third week, evaluates students’ readiness in these areas through fundamental operations, code interpretation, and conceptual reasoning using clean and well-structured examples.
  • Module Assignments (48%): Module Assignments are designed to build students’ skill and confidence in data wrangling and in communicating analytical findings through writing. There are six Module Assignments, one per module, and although formats may vary, all emphasize either “showing” how an analysis was performed in a tutorial-style format and/or “telling” what was discovered in a reporting-style narrative. Each assignment is worth 8% of the final grade (48% total) and is assessed through two equally weighted components: a written submission and a corresponding in-class quiz evaluating understanding of the submitted work. Assignments are due one week after each module concludes (except the last Module Assignment), followed by the in-class quiz. Late submissions will be penalized by 14% per day without an approved excuse, with no credit awarded after one week.
  • Required Course Reading (6%): Familiarity with the required course readings ensures that the students are prepared for the material presented in class. To ensure Reading Readiness, students' familiarity with the required readings will be assessed through Perusall. Please follow a link under each Canvas reading assignment to be taken to specific readings in Perusall. As the readings inform your understanding of the lectures, late reading submissions will not be accepted.
  • Class Participation (10%): To encourage students' active engagement with the course material, they will be graded on their class participation through Poll Everywhere in-class polls. Each class' poll will be worth 0.5% of the final grade. Up to 6 absences will not affect students' participation grade (we will drop 6 lowest polls).
  • Concept Practice Assignments (10%): Concept Practice Assignments are submission-based, low-stakes practice activities designed to reinforce core data-wrangling concepts introduced in each module. Students are required to submit their work for credit. These assignments use clean, well-organized datasets and focus on correct application of data operations, conceptual understanding, without the added complexity of messy, ambiguous, or open-ended data. There are six Concept Practice Assignments in total, one aligned with each course module. Their primary purpose is practice and reinforcement, preparing students for module tests and more comprehensive data-wrangling tasks later in the course.
  • Final Project (25%): The Final Project is intended to be a portfolio piece highlighting a student’s analytical and communicative abilities. The project can be an extension of a Module Assignment that goes deeper in data analysis and write-up. The Final Project grade will consist of four components:
    • Final Project Proposal (7%): students will submit a short document in the middle of the term detailing their proposed final project.
    • Final Project Proposal Feedback (2%): students will provide light written feedback to other students in order to improve the quality of the Final Project Proposals. 
    • Final Project Presentation (6%): students will record 5 min videos presenting the main methods and findings of their Final Projects. Selected videos will be presented to the class in the last day of the course.
    • Final Project Presentation Feedback (2%): students will rank and optionally provide light written feedback to other students in order to improve the quality of the Final Project Presentations. Your feedback will help the course staff select the videos to be shown on the last day of class.
    • Final Project Write-up (8%): students will submit a detailed write-up of the methods used and results accomplished in their Final Project. late submission will not be accepted.

All work should be your own. Use of generative AI for coding or writing is not permitted.

Regrade / Errors in grading

It is very important to us that all assignments are properly graded. If you believe there is an error in your assignment grading, please submit an explanation via Canvas to us within 7 days of receiving the grade. No regrade requests will be accepted orally, and no regrade requests will be accepted more than 7 days after you receive the grade for the assignment.

Communication / Getting Help

  • A key responsibility for a student in this course is to use the online Canvas class website and to check it regularly for due dates, updated materials, and corrections. To send urgent messages to everyone in the class, such as corrections to assignments or changes in due dates, we will make an announcement in the Canvas site. Students are expected to check the class website regularly.
  • Students who would like to ask a question to course staff, please create a private post in Piazza. In this way, only course staff can see your question. For technical questions regarding clarification on assignments, it is best to post the question on Piazza as public post so that everyone can see the question and response and possibly provide a suggestion. 
  • Students are encouraged to use Piazza for additional questions outside of class and office hours. Feel free to post questions regarding anything related to class: module assignments, schedule, material covered in class. Also feel free to answer questions, the instructor and TA will also actively be answering questions. But, do not post potential homework answers. Such posts will be immediately removed, and not answered. 
  • Take advantage of the instructor and TA office hours (posted on course web page). We will work hard to be accessible to students. Please send us Canvas mail if you need to meet outside of office hours. Don’t be shy if you don’t understand something: attend office hours, post question in Piazza, or speak up in class!

Schedule

Module Week Dates Due Dates
Introduction 1 Jan 5 - Jan 9
2 Jan 12 - Jan 16
Exploring 3 Jan 19 - Jan 23 Foundational Skills Assessment
4 Jan 26 - Jan 30 Concept Practice Assignment1
Cleaning 5 Feb 2 - Feb 6 MA1 due
6 Feb 9 - Feb 13 Quiz 1
Concept Practice Assignment2
Retrieving 7 Feb 16 - Feb 20 MA2 due
8 Feb 23 - Feb 27 Quiz 2
Concept Practice Assignment3
Final Project Proposal Due
Combining 9 Mar 2 - Mar 6 MA3 due
Spring Break 10 Mar 9 - Mar 13
Combining 11 Mar 16 - Mar 20 Quiz 3
Concept Practice Assignment4
Inferring 12 Mar 23 - Mar 27 MA4 due
13 Mar 30 - Apr 3 Quiz 4
Concept Practice Assignment5
Extrapolating 14 Apr 6 - Apr 10 MA5 due
15 Apr 13 - Apr 17 Quiz 5
Concept Practice Assignment6
Wrap-up 16 Apr 20 - Apr 21

MA6 due
Quiz 6
Final Project due

Statistical Computing

Jupyter notebooks written in Python 3 will be used for all in-class examples and assignments. We will be using Google ColabLinks to an external site., so students will not need to install python locally on their machines. 

Statements and Links

Academic Misconduct

College of Engineering Guidelines 

For information on withdrawing from courses, appealing grades, and more, see the College of Engineering guidelines at https://www.coe.utah.edu/students/current/semester-guidelines/. (Links to an external site.)Links to an external site.

School of Computing guidelines 

For more information on School of Computing policies and guidelines, please refer to https://handbook.cs.utah.edu/2024-2025/DS/Academics/policies.phpLinks to an external site.

Safety 

The University of Utah values the safety of all campus community members. To report suspicious activity or to request a courtesy escort, call campus police at 801-585-COPS (801-585-2677). You will receive important emergency alerts and safety messages regarding campus safety via text message. For more information regarding safety and to view available training resources, including helpful videos, visit safeu.utah.edu. (Links to an external site.)Links to an external site.

Academic Accommodations

  • The University of Utah seeks to provide equal access to its programs, services and activities for people with disabilities. If you will need accommodations in the class, reasonable prior notice needs to be given to the Center for Disability and Access (http://disability.utah.edu (Links to an external site.Links to an external site.), (801) 581-5020. CDA will work with you and the instructor to make arrangements for accommodations. Accommodations cannot be given without paperwork from this office.
  • If you are aware that you qualify as having a disability or believe that you might qualify, we encourage you to reach out to the CDA as soon as possible. You can always choose not to use accommodations recommended by the CDA, and School of Computing faculty and staff are not made aware of your arrangement until you notify them.
  • We also recognize that current circumstances can be very disruptive to established routines and strategies. We are not experts, but we encourage you (if you have not) to consider proactively establishing or re-establishing contact with appropriate groups or professionals in order to explore (for example) what routines or strategies might benefit from being updated given the current global circumstances. Also see the section below on Wellness, Resiliency, Self-Care, and Productivity.

Discrimination and Harassment

 Violence and harassment based on sex and gender (which includes sexual orientation and gender identity/expression), race, national origin, color, religion, age, status as a person with a disability, veteran’s status, or genetic information are civil rights offenses. If you or someone you know has been harassed or assaulted, you are encouraged report the incident to the Office of Equal Opportunity (OEO) or to the Office of the Dean of StudentsLinks to an external site.. Counseling is available at the University Counseling CenterLinks to an external site..  Resources for general wellness and resiliency are available at the Center for Student WellnessLinks to an external site..  To report to the police, contact the Department of Public SafetyLinks to an external site., 801-585-2677(COPS).

Student Names & Personal Pronouns 

Class rosters are provided to the instructor with the student’s legal name as well as “Preferred first name” (if previously entered by you in the Student Profile section of your CIS account). Please advise me of any name or pronoun changes (and update CIS) so I can help create a learning environment in which you, your name, and your pronoun will be respected. If you need assistance getting your preferred name on your UIDcard, please visit the LGBT Resource Center Room 409 in the Olpin Union Building, or email bpeacock@sa.utah.edu to schedule a time to drop by. The LGBT Resource Center hours are M-F 8am-5pm, and 8am-6pm on Tuesdays.

Wellness, Resiliency, Self-Care, and Productivity

Let us all take a moment to acknowledge that several past semesters have been different to various degrees because of the pandemic. For some students, this semester may be disrupted by the pandemic as well. Some of us may feel that this is still a disruptive time. Others may feel that the pandemic does not impact their lives too much. Regardless, maintaining or adopting new ways to proactively practice “self-care” can help maintain or improve your overall wellness and resiliency, which is valuable both for its own sake and because it can help you succeed academically. Additionally, working and studying from home removes some of the structure and “rituals” that you may have been accustomed to. You may need new coping strategies.

You might want to consider giving yourself the “homework” — especially early in the semester — of looking through wellness and work-from-home resources/opportunities and then figuring out what works for you. You could think of it as an opportunity to debug and learn how to optimize yourself!

General strategies for wellbeing include things like: getting enough sleep on a consistent schedule, getting enough exercise and sunlight, interpersonal contact, separation of “work” and “play” time and spaces, accountability structure (e.g., regularly attending lecture), and practicing time management (so that you know what you should be working on when, that you have enough time to get things done based on how things are going, and that you can put work down at the end of the day).

We encourage you to dedicate some intentional time to better understand what helps you feel (and do!) your best so that you are well-equipped for whatever the year brings. The University has resources like:

However, you don’t need to be limited by looking through the resources offered by the university!