Course Syllabus
Course Description
This course provides an introduction to data wrangling and prepares students for further courses in the data science curriculum. Students will learn methods and skills involved in data collection, cleaning, and organizing. In addition to the data wrangling methods, students will also learn to think critically about the ethical and social implications of using data.
Learning Objectives
- Obtain data from existing sources such as websites (web-scraping) and platform APIs
- Set up and manage new data collection systems, with the focus on value-driven data collection (i.e., collecting the right data to answer a research question or achieve a business goal)
- Clean up, aggregate, sample, reshape, and normalize data sets using tidy framework, statistical techniques, and existing large-scale data analytics tools
- Perform exploratory data analysis on real-world data sets
- Reflect on the applicability of the data wrangling methods in specific contexts and their ethical implications.
Course Design
The format of each class will vary between lectures, analysis labs, and in-class activities. Student performance will be evaluated through a combination of Reading Readiness, Module Assignments (MAs), Class Participation, and a Final Project. There is no final exam.
The class is split up into six two-week modules. Each module corresponds to a fundamental data wrangling technique: (1) exploring an existing data set, (2) cleaning data into “tidier” formats, (3) retrieving new data from the web, (4) combining different data sets together, (5) inferring patterns in the data, (6) extrapolating and predicting with time series analysis.
Readings
There is no textbook required for class, but there will be required readings, tutorials, and other material, which will be made available through Canvas. Familiarity with the readings will be assessed through Perusall.
Basic Information
Contact: Post all questions to Canvas (except for private matters)
Canvas: https://utah.instructure.com/courses/975260
Class Time & Location: Monday & Wednesday, 1:25-2:45PM, WEB L103
Office Hours:
Scheduled hours are held most weeks:
- Prof. Kogan: 3-4PM Thursdays, MEB 3140
- Naman: 2-3PM Fridays, MEB 3115
- Sydney: 2-3PM Tuesdays, CADE Lab
Other meetings by appointment.
Grading
Students will be evaluated through four different mechanisms.
- Module Assignments (48%): Module Assignments are intended to develop students’ skill and confidence in using data wrangling tools and written communication to share the findings. There are six Module Assignments in total, one per module. Each Module Assignment is worth 8% of the final grade (48% cumulative), and they are due on Sunday before the start of the next module. The format and evaluation criteria of each Module Assignment will vary slightly, but will share two themes: “showing” how you did a data analysis in a tutorial-like format and/or “telling” what you found from a data analysis in a reporting-like format. In the absence of an approved excuse, late submissions will be docked 0.59% of their value for every hour elapsed since the deadline (you get 0 points in about a week).
- Required Course Reading (6%): Familiarity with the required course readings ensures that the students are prepared for the material presented in class. To ensure Reading Readiness, students' familiarity with the required readings will be assessed through Perusall. Please follow a link under each Canvas reading assignment to be taken to specific readings in Perusall. As the readings inform your understanding of the lectures, late reading submissions will not be accepted.
- Class Participation (10%): To encourage students' active engagement with the course material, they will be graded on their class participation through Poll Everywhere in-class polls. Each class' poll will be worth 0.5% of the final grade. Up to 6 absences will not affect students' participation grade (we will drop 6 lowest polls).
- Final Project (36%): The Final Project is intended to be a portfolio piece highlighting a student’s analytical and communicative abilities. The project can be an extension of a Module Assignment that goes deeper in data analysis and write-up. The Final Project grade will consist of four components:
- Final Project Proposal (7%): students will submit a short document in the middle of the term detailing their proposed final project
- Final Project Proposal Feedback (2%): students will provide light written feedback to other students in order to improve the quality of the Final Project Proposals.
- Final Project Presentation (8%): students will record 5 min videos presenting the main methods and findings of their Final Projects. Selected videos will be presented to the class in the last day of the course.
- Final Project Presentation Feedback (1%): students will rank and optionally provide light written feedback to other students in order to improve the quality of the Final Project Presentations. Your feedback will help the course staff select the videos to be shown on the last day of class.
- Final Project Write-up (18%): students will submit a detailed write-up of the methods used and results accomplished in their Final Project. In the absence of an approved excuse, late Final Project Write-up submissions will be docked 2% of their value for every hour elapsed since the deadline.
All work should be your own. Use of generative AI for coding or writing is not permitted.
Regrade / Errors in grading
It is very important to us that all assignments are properly graded. If you believe there is an error in your assignment grading, please submit an explanation via Canvas to us within 7 days of receiving the grade. No regrade requests will be accepted orally, and no regrade requests will be accepted more than 7 days after you receive the grade for the assignment.
Communication / Getting Help
- A key responsibility for a student in this course is to use the online Canvas class website and to check it regularly for due dates, updated materials, and corrections. To send urgent messages to everyone in the class, such as corrections to assignments or changes in due dates, we will make use of the email addresses connected to the Canvas site. Students are expected to check their email and the class website regularly.
- Students who would like to ask a question should email the instructors through the Canvas site. Questions should be addressed to "All Instructors"—in this way you will get all the course staff, so that is really the best way to get a fast response. For technical questions regarding clarification on assignments, it is best to post the question on discussion board so that everyone can see the question and response and possibly provide a suggestion.
- Students are encouraged to use Canvas discussion board for additional questions outside of class and office hours. Feel free to post questions regarding anything related to class: module assignments, schedule, material covered in class. Also feel free to answer questions, the instructor and TA will also actively be answering questions. But, do not post potential homework answers. Such posts will be immediately removed, and not answered.
- Take advantage of the instructor and TA office hours (posted on course web page). We will work hard to be accessible to students. Please send us Canvas mail if you need to meet outside of office hours. Don’t be shy if you don’t understand something: attend office hours, send Canvas mail, or speak up in class!
Schedule
Module | Week | Dates | Topics | Due Dates |
Introduction | 1 | Aug 19 - Aug 23 | Introductions, Python & Pandas review | |
2 | Aug 26 - Aug 30 | Python, Pandas, & Stats review | ||
Exploring | 3 | Sep 2 - Sep 6 | Exploratory data analysis |
MA1 due 9/15 |
4 | Sep 9 - Sep 13 | Interviewing a dataset | ||
Cleaning | 5 | Sep 16 - Sep 20 | Handling missing data |
MA2 due 9/29 |
6 | Sep 23 - Sep 27 | Reshaping and tidying data | ||
Retrieving | 7 | Sep 30 - Oct 4 | Web scraping |
Final Project Proposal due 10/6 MA3 due 10/20 |
Fall Break | 8 | Oct 6 - Oct 13 | No class | |
Retrieving | 9 | Oct 14 - Oct 18 | APIs | |
Combining | 10 | Oct 21 - Oct 25 | Joins |
MA4 due 11/3 |
11 | Oct 28 - Nov 1 | Pivots and groupby aggregation | ||
Inferring | 12 | Nov 4 - Nov 8 | Hypothesis testing |
MA5 due 11/17 |
13 | Nov 11 - Nov 15 | Causation, counterfactuals | ||
Extrapolating | 14 | Nov 18 - Nov 22 | Time series analysis |
MA6 due 12/1 |
15 |
Nov 25 - Nov 27 |
Time series II | ||
Wrap-up | 16 | Dec 2 - Dec 5 | Select Final Presentations, Reflections | Final Project due 12/13 |
Statistical Computing
Jupyter notebooks written in Python 3 will be used for all in-class examples and assignments. We will be using Google Colab, so students will not need to install python locally on their machines.
Statements and Links
Academic Misconduct
-
You are bound by the School Of Computing’s Academic Misconduct Policy https://www.cs.utah.edu/academic-misconduct/. You should not use content or ideas from other people without directly citing your source, and your submitted assignments must be the work of yourself (and your group, in the case of group assignments). If you are in doubt about whether something is allowed, you should ask the course staff.
- The School of Computing has instituted a two strikes and you're out cheating policy, meaning if you get caught cheating twice in any SoC classes, you will be unable to take any future SoC courses. https://handbook.cs.utah.edu/2019-2020/Academics/policies.php
- For a detailed description of the university policy on cheating, please see the University of Utah Student Code: http://www.regulations.utah.edu/academics/6-400.html.
College of Engineering Guidelines
For information on withdrawing from courses, appealing grades, and more, see the College of Engineering guidelines at https://www.coe.utah.edu/students/current/semester-guidelines/. (Links to an external site.)
School of Computing guidelines
For more information on School of Computing policies and guidelines, please refer to https://handbook.cs.utah.edu/2019-2020/Academics/policies.php (Links to an external site.)
Safety
The University of Utah values the safety of all campus community members. To report suspicious activity or to request a courtesy escort, call campus police at 801-585-COPS (801-585-2677). You will receive important emergency alerts and safety messages regarding campus safety via text message. For more information regarding safety and to view available training resources, including helpful videos, visit safeu.utah.edu. (Links to an external site.)
Academic Accommodations
- The University of Utah seeks to provide equal access to its programs, services and activities for people with disabilities. If you will need accommodations in the class, reasonable prior notice needs to be given to the Center for Disability and Access (http://disability.utah.edu (Links to an external site.), (801) 581-5020. CDA will work with you and the instructor to make arrangements for accommodations. Accommodations cannot be given without paperwork from this office.
- If you are aware that you qualify as having a disability or believe that you might qualify, we encourage you to reach out to the CDA as soon as possible. You can always choose not to use accommodations recommended by the CDA, and School of Computing faculty and staff are not made aware of your arrangement until you notify them.
- We also recognize that current circumstances can be very disruptive to established routines and strategies. We are not experts, but we encourage you (if you have not) to consider proactively establishing or re-establishing contact with appropriate groups or professionals in order to explore (for example) what routines or strategies might benefit from being updated given the current global circumstances. Also see the section below on Wellness, Resiliency, Self-Care, and Productivity.
Discrimination and Harassment
Violence and harassment based on sex and gender (which includes sexual orientation and gender identity/expression), race, national origin, color, religion, age, status as a person with a disability, veteran’s status, or genetic information are civil rights offenses. If you or someone you know has been harassed or assaulted, you are encouraged report the incident to the Office of Equal Opportunity And Affirmative Action (OEO/AA) or to the Office of the Dean of Students. Counseling is available at the University Counseling Center. Resources for general wellness and resiliency are available at the Center for Student Wellness. To report to the police, contact the Department of Public Safety, 801-585-2677(COPS).
Student Names & Personal Pronouns
Class rosters are provided to the instructor with the student’s legal name as well as “Preferred first name” (if previously entered by you in the Student Profile section of your CIS account). Please advise me of any name or pronoun changes (and update CIS) so I can help create a learning environment in which you, your name, and your pronoun will be respected. If you need assistance getting your preferred name on your UIDcard, please visit the LGBT Resource Center Room 409 in the Olpin Union Building, or email bpeacock@sa.utah.edu to schedule a time to drop by. The LGBT Resource Center hours are M-F 8am-5pm, and 8am-6pm on Tuesdays.
Wellness, Resiliency, Self-Care, and Productivity
Let us all take a moment to acknowledge that several past semesters have been different to various degrees because of the pandemic. For some students, this semester may be disrupted by the pandemic as well. Some of us may feel that this is still a disruptive time. Others may feel that the pandemic does not impact their lives too much. Regardless, maintaining or adopting new ways to proactively practice “self-care” can help maintain or improve your overall wellness and resiliency, which is valuable both for its own sake and because it can help you succeed academically. Additionally, working and studying from home removes some of the structure and “rituals” that you may have been accustomed to. You may need new coping strategies.
You might want to consider giving yourself the “homework” — especially early in the semester — of looking through wellness and work-from-home resources/opportunities and then figuring out what works for you. You could think of it as an opportunity to debug and learn how to optimize yourself!
General strategies for wellbeing include things like: getting enough sleep on a consistent schedule, getting enough exercise and sunlight, interpersonal contact, separation of “work” and “play” time and spaces, accountability structure (e.g., regularly attending lecture), and practicing time management (so that you know what you should be working on when, that you have enough time to get things done based on how things are going, and that you can put work down at the end of the day).
We encourage you to dedicate some intentional time to better understand what helps you feel (and do!) your best so that you are well-equipped for whatever the year brings. The University has resources like:
- Center for Student Wellness
- Mindfulness Center
- Online Fitness Services at Campus Recreation Services
- University Counseling Center
However, you don’t need to be limited by looking through the resources offered by the university!