Course Syllabus

Course Description

This course provides an introduction to data wrangling and prepares students for further courses in the data science curriculum. Students will learn methods and skills involved in data collection, cleaning, and organizing.  In addition to the data wrangling methods, students will also learn to think critically about the ethical and social implications of using data.

Learning Objectives

  • Obtain data from existing sources such as websites (web-scraping) and platform APIs
  • Set up and manage new data collection systems, with the focus on value-driven data collection (i.e., collecting the right data to answer a research question or achieve a business goal)
  • Clean up, aggregate, sample, reshape, and normalize data sets using regular expressions, statistical techniques, and existing large-scale data analytics tools
  • Perform exploratory data analysis on real-world data sets

Course Design

The format of each class will vary between lectures, analysis labs, and presentations. Student performance will be evaluated through a combination of Module Assignments, Weekly Presentations, and a Final Project. There is no final exam.

The class is split up into six two-week modules. Each module corresponds to a fundamental data wrangling technique: (1) exploring an existing data set, (2) cleaning data into “tidier” formats, (3)  retrieving new data from the web, (4) inferring patterns in the data, (5) combining different data sets together, and (6) aggregating and grouping data.

Readings

There is no textbook required for class, but there will be required readings, tutorials, and other material, which will be made available through Canvas.

Basic Information

Contact: Post all questions to Canvas (except for private matters)

Canvas: https://utah.instructure.com/courses/750276

Class Time & Location: Tuesdays & Thursdays, 10:45-12:05, BEH S 101

OR: https://utah.zoom.us/j/95276609183, recordings in Media Gallery

Office Hours:

Scheduled hours are held most weeks:

  • Dr. Kogan: Friday, 11:00am - 12:00pm, MEB 2176
  • Jordan: Monday 1:30pm - 2:30pm, MEB 3115
  • Max: Wednesday, 1:30pm - 2:30pm, Zoom: https://utah.zoom.us/my/lisnic 

Other meetings by appointment.

Grading

Students will be evaluated through three different mechanisms.

  • Module Assignments (60%): Module Assignments are intended to develop students’ skill and confidence in using data wrangling tools and written communication to share the findings. There are six Module Assignments in total, one per module. Each Module Assignment is worth 10% of the final grade (60% cumulative), and they are due on Tuesday before the start of the next module. The format and evaluation criteria of each Module Assignment will vary, but will share two themes: “showing” how you did a data analysis in a tutorial-like format and/or “telling” what you found from a data analysis in a reporting-like format.  In the absence of an approved excuse, late submissions will be docked 2% of their value for every hour elapsed since the deadline: assignments submitted after Thursday at 10:45AM will lose all credit.
  • Module Assignment Presentations (5%): Module Assignment Presentations are intended to develop students’ communication skills by summarizing their approaches and findings and providing peer feedback. Each Thursday, students should be prepared to present on their Module Assignment’s progress, questions, and concerns. Five students will be randomly selected to present for 4 minutes each on their work-in-progress. Each student will have one chance to present throughout the semester, accounting for 5% of the final grade. Other students will provide valuable feedback, constructive critique, and suggestions. If a student has a disability, anxiety, or another issue that limits their ability to participate in this format, please email the instructor. In the absence of an approved excuse, students who do not participate in a week’s critical response process will not receive credit. There will be no make-up opportunities for missed Module Assignment Presentation credit.
  • Final Project (35%): The Final Project is intended to be a portfolio piece highlighting a student’s analytical and communicative abilities. The project will be an extension of a Module Assignment that goes deeper in data analysis and write-up. The Final Project grade will consist of four components:
    • Final Project Proposal (5%): students will submit a short document in the middle of the term detailing their proposed final project
    • Final Project Proposal Feedback (3%): students will provide detailed written feedback to other students in order to improve the quality of the Final Projects. 
    • Final Project Presentation (7%): students will record 4 min videos presenting the main methods and findings of their Final Projects. These videos will be presented to class in the last two weeks of the course.
    • Final Project Write-up (20%): students will submit a detailed write-up of the methods used and results accomplished in their Final Project. In the absence of an approved excuse, late Final Project Write-up submissions will be docked 2% of their value for every hour elapsed since the deadline.

Regrade / Errors in grading

It is very important to us that all assignments are properly graded. If you believe there is an error in your assignment grading, please submit an explanation via Canvas to us within 7 days of receiving the grade. No regrade requests will be accepted orally, and no regrade requests will be accepted more than 7 days after you receive the grade for the assignment.

Communication / Getting Help

  • A key responsibility for a student in this course is to use the online Canvas class website and to check it regularly for due dates, updated materials, and corrections. To send urgent messages to everyone in the class, such as corrections to assignments or changes in due dates, I will make use of the email addresses connected to the Canvas site. Students are expected to check their email and the class website regularly.
  • Students who would like to ask a question should email the instructors through the Canvas site. Questions should be addressed to "All Instructors", you will get all the course staff, so that is really the best way to get a response. For technical questions regarding clarification on assignments, it is best to post the question on discussion board so that everyone can see the question and response and possibly provide a suggestion. 
  • Students are encouraged to use a discussion board for additional questions outside of class and office hours. Feel free to post questions regarding any questions related to class: module assignments, schedule, material covered in class. Also feel free to answer questions, the instructor and TA will also actively be answering questions. But, do not post potential homework answers. Such posts will be immediately removed, and not answered. 
  • Take advantage of the instructor and TA office hours (posted on course web page). We will work hard to be accessible to students. Please send us email if you need to meet outside of office hours. Don’t be shy if you don’t understand something: attend office hours, send email, or speak up in class!

Schedule

Module Week Dates Topics Due Date
Introduction 1 Jan 10 - Jan 14 Introductions, Python & Pandas review
Exploring 2 Jan 18 - Jan 21 Exploratory data analysis Module Assignment 1 due Feb 1
3 Jan 24 - Jan 28 Interviewing a dataset
Cleaning 4 Jan 31 - Feb 4 Handling missing data Module Assignment 2 due Feb 15
5 Feb 7 - Feb 11 Reshaping and tidying data
Retrieving 6 Feb 14 - Feb 18 Web scraping Module Assignment 3 due Mar 1
7 Feb 22 - Feb 25 API scraping
Combining 8 Feb 28 - Mar 4 Joins
Spring Break Mar 7 - Mar 11 No class Final Project Proposal due Mar 15
Combining 9 Mar 14 - Mar 18 Pivots and groupby aggregation Module Assignment 4 due Mar 22
Inferring 10 Mar 21 - Mar 25 Hypothesis testing Module Assignment 5 due Apr 5
11 Mar 28 - Apr 1 Causation, counterfactuals
Extrapolating 12 Apr 4 - Apr 8 Time series analysis Module Assignment 6 due Apr 19
13

Apr 11 - Apr 15

Time series II
Final Project Presentations 14 Apr 18 - Apr 22 Final Presentations Final Project due May 4
15 Apr 25 - Apr 26

Statistical Computing

Jupyter notebooks written in Python 3 will be used for all in-class examples and assignments. We will be using Google Colab, so students will not need to install python locally on their machines.  

Statements and Links

Academic Misconduct

  • You are bound by the School Of Computing’s Academic Misconduct Policy https://www.cs.utah.edu/academic-misconduct/. You should not use content or ideas from other people without directly citing your source, and your submitted assignments must be the work of yourself (and your group, in the case of group assignments). If you are in doubt about whether something is allowed, you should ask the course staff.

  • The School of Computing has instituted a two strikes and you're out cheating policy, meaning if you get caught cheating twice in any SoC classes, you will be unable to take any future SoC courses. https://handbook.cs.utah.edu/2019-2020/Academics/policies.php
  • For a detailed description of the university policy on cheating, please see the University of Utah Student Code: http://www.regulations.utah.edu/academics/6-400.html.

College of Engineering Guidelines 

For information on withdrawing from courses, appealing grades, and more, see the College of Engineering guidelines at https://www.coe.utah.edu/students/current/semester-guidelines/. (Links to an external site.)

School of Computing guidelines 

For more information on School of Computing policies and guidelines, please refer to https://handbook.cs.utah.edu/2019-2020/Academics/policies.php (Links to an external site.)

Safety 

The University of Utah values the safety of all campus community members. To report suspicious activity or to request a courtesy escort, call campus police at 801-585-COPS (801-585-2677). You will receive important emergency alerts and safety messages regarding campus safety via text message. For more information regarding safety and to view available training resources, including helpful videos, visit safeu.utah.edu. (Links to an external site.)

Academic Accommodations

  • The University of Utah seeks to provide equal access to its programs, services and activities for people with disabilities. If you will need accommodations in the class, reasonable prior notice needs to be given to the Center for Disability and Access (http://disability.utah.edu (Links to an external site.), (801) 581-5020. CDA will work with you and the instructor to make arrangements for accommodations. Accommodations cannot be given without paperwork from this office.
  • If you are aware that you qualify as having a disability or believe that you might qualify, we encourage you to reach out to the CDA as soon as possible. You can always choose not to use accommodations recommended by the CDA, and School of Computing faculty and staff are not made aware of your arrangement until you notify them.
  • We also recognize that current circumstances can be very disruptive to established routines and strategies. We are not experts, but we encourage you (if you have not) to consider proactively establishing or re-establishing contact with appropriate groups or professionals in order to explore (for example) what routines or strategies might benefit from being updated given the current global circumstances. Also see the section below on Wellness, Resiliency, Self-Care, and Productivity.

Discrimination and Harassment

 Violence and harassment based on sex and gender (which includes sexual orientation and gender identity/expression), race, national origin, color, religion, age, status as a person with a disability, veteran’s status, or genetic information are civil rights offenses. If you or someone you know has been harassed or assaulted, you are encouraged report the incident to the Office of Equal Opportunity And Affirmative Action (OEO/AA) or to the Office of the Dean of Students. Counseling is available at the University Counseling Center.  Resources for general wellness and resiliency are available at the Center for Student Wellness.  To report to the police, contact the Department of Public Safety, 801-585-2677(COPS).

Student Names & Personal Pronouns 

Class rosters are provided to the instructor with the student’s legal name as well as “Preferred first name” (if previously entered by you in the Student Profile section of your CIS account). Please advise me of any name or pronoun changes (and update CIS) so I can help create a learning environment in which you, your name, and your pronoun will be respected. If you need assistance getting your preferred name on your UIDcard, please visit the LGBT Resource Center Room 409 in the Olpin Union Building, or email bpeacock@sa.utah.edu to schedule a time to drop by. The LGBT Resource Center hours are M-F 8am-5pm, and 8am-6pm on Tuesdays.

COVID Update

  • The Salt Lake County Health Department has adopted a mask mandate for indoor spaces and while queueing outdoors from Jan. 8 through Feb. 7. This public health order will apply to our campus.
  • Updated guidance for the Spring 2022 Semester is posted in @theU.
  • New self-serve testing sites opened Monday.

Wellness, Resiliency, Self-Care, and Productivity

Let us all take a moment to acknowledge that several past semesters have been rather different because of the pandemic. This semester may be disrupted by the pandemic as well. Some of us may feel that this is an extremely overwhelming and disruptive time. Others may feel that the pandemic does not impact their lives too much. Regardless, maintaining or adopting new ways to proactively practice “self-care” can help maintain or improve your overall wellness and resiliency, which is valuable both for its own sake and because it can help you succeed academically. Additionally, working and studying from home removes some of the structure and “rituals” (e.g., physically going to the classroom) that you may have been accustomed to. You may need new strategies, especially if companies continue to have employees work remotely for a little while.

You might want to consider giving yourself the “homework” - especially early in the semester - of looking through wellness and work-from-home resources/opportunities and then figuring out what works for you. You could think of it as an opportunity to debug and learn how to optimize yourself!

General strategies for wellbeing include things like: getting enough sleep on a consistent schedule, getting enough exercise and sunlight, interpersonal contact, separation of “work” and “play” time and spaces, accountability structure (e.g., regularly attending lecture), and practicing time management (so that you know what you should be working on when, that you have enough time to get things done based on how things are going, and that you can put work down at the end of the day).

We encourage you to dedicate some intentional time to better understand what helps you feel (and do!) your best so that you are well-equipped for whatever the year brings. The University has resources like:

However, you don’t need to be limited by looking through the resources offered by the university!