In this course, we will explore the moral, social, and ethical ramifications of the choices we make at the different stages of the data analysis pipeline, from data collection and storage to understand feedback loops in analysis. Through class discussions, case studies and exercises, students will learn the basics of ethical thinking in science, understand the history of ethical dilemmas in scientific work, and study the distinct challenges associated with ethics in modern data science.
When: MWF 9:40-10:30
Where: WEB L110
Instructor: Suresh Venkatasubramanian
Co-Instructor: Katie Shelef
Office hours: (SV) MEB 3404, MF 3-4pm (note that on holidays like Labor Day there will be no office hours)
Text: Ethics for the Information Age, 7th Ed., by Michael Quinn
- Weekly writing (or coding) assignments: (60%)
- Class participation (including scribe notes) (10%)
- Project (2 people max): a case study of ethical decision-making in data analysis (30%)
- A quick tour through the foundations of ethics
- The data collection process
- Doing ethical data analysis
- Acting on your predictions
- Remedies and Responsibilities
Guidelines for Class Discussion
In this class we will often touch on issues that are controversial, touch on diverse and strongly held beliefs, and address deeply personal issues of identity and culture. While we want to have a healthy and vigorous debate, we must be able to express our views without attacking others in a personal way. To that end, I've prepared some guidelines for class discussion.
Class rosters are provided to the instructor with the student’s legal name as well as “Preferred first name” (if previously entered by you in the Student Profile section of your CIS account). While CIS refers to his as merely a preference, I will honor you by referring to you with the name and pronoun that feels best for you in class, on papers, exams, group projects, etc. Please advise me of any name or pronoun changes (and please update CIS) so I can help create a learning environment in which you, your name, and your pronoun will be respected
Assignments will for the most part be essays that answer specific questions based on the assigned readings. Each assignment will generally be no more than 2 pages long (11 pt, single spaced) and should be turned in electronically (in PDF format, either generated directly or exported from another text editing mechanism).
Assignments will be graded based on your facility in
- Summarizing the problem statement or issue
- considering context and assumptions inherent in the topic
- communicating your own perspective or position.
- justifying your answer with evidence
- using other perspectives to add context to your answer
- following through on implications and consequences where they lead you
- communicating effectively (with good organization, clean presentation and effective language)
For your project, I'd like you to undertake a more detailed analysis of the ethical considerations in a data science setting of your choice. As an example of what you might want to aspire to (although you may not be able to achieve the level of detail in these articles), I present three case studies developed by the Council on Big Data, Ethics and Society.
No Encore for Encore? Ethical questions for web-based censorship measurement
The Ethics of Using Hacked Data: Patreon’s Data Hack and Academic Data Standards
- "It was a matter of life or death": A YouTube engineer's decision to alter data in the "It Gets Better" project.
These are merely ideas for how you might approach a particular scenario. But you should feel free to choose other topics/formats.
Aug 23: Introduction to the class, logistics. Overview of the course
- Reading: Physiognomy's new clothes
Aug 25: Discussion of reading, introduction to ethical frameworks
- Reading: EIA Chapter 2.1-2.4
Aug 28: Utiliitarianism (by action and by rule)
- Reading: EIA Chapter 2.7-2.8, IEP section on utilitarianism.
- Reading: Ethical guidelines for driverless cars in Germany (will be the topic of first writing assignment).
- Interactive: Try out the Moral Machine version of the trolley car problem.
Aug 30: Utilitarianism (continued), social contracts.
- Reading: EIA Chapter 2.9
Sep 1: Rawls and ideas of fairness in society
- Reading: EIA Chapter 2.10
- Discussion: the ethics of adblock
Sep 6: Kant, deontological ethics and the categorical imperative
- Reading: EIA Chapter 2.6
- Other material:
Sep 8: Kant, continued.
- Reading: EIA Chapter 2.10
- DIscussion: The Facebook mood experiment.
- Assignment: The ethics of piracy (due Sep 15)
Sep 11: Virtue Ethics
- Reading: Virtue Ethics (from the IEP)
- Optional Reading: Virtue Ethics in the eastern tradition (Hinduism, Buddhism and Confucianism)
Sep 13: Data Collection: where do ethical conundrums arise in the process of collecting data.
Sep 15: Data Collection (continued)
- Discussion: Car Wars
- Reading: The Ethics of Internet Software and Consumer Privacy
Sep 18: Data as commodity: Data Brokers
- Background: FTC report on data brokers, a brief snapshot of what they collect.
- Discussion: The Equifax hack: (background, critique of their response, on the mutability of data)
- Extra: What does Facebook think it knows about you (Chrome Plugin)
Sep 20: Data as (personal) property: De-identification and anonymization
- Background: Terminology
- Guide to deidentifying data for medical purposes.
- The paper that deanonymized Netflix data, and an FAQ for this paper.
Sep 22: Data as property continued.
- Assignment: The Chinese social credit score.
- WEEK of Sep 25-29: Suresh is traveling (as it turns out, to an event on AI ethics).
Oct 2: Data as public resource: News and Medicine
- The All of Us initiative at the NIH to create a 1M-person cohort for precision medicine. The initiative's statement of privacy and trust.
- Facebook human curation of trending news is biased?
- Facebook automated curation of trending news is broken?
- Optional Reading: Net neutrality (on the Internet as a shared resource)
- Feedback: Evaluating the Sesame Credit assignment.
Oct 4: The Ethics of Data Analysis: Science and Behavior
Oct 6: The Ethics of Data Analysis: Science and Behavior, continued
Oct 16: The Mechanics of Data Analysis: Collection
- The Hidden Biases in Big Data
- Raw Data is an Oxymoron (introduction to a book)
Oct 18: The Mechanics of Data Analysis: Model building
- Reading: How Big Data is Unfair
Oct 20: The Mechanics of Data Analysis: Prediction and Feedback
- Reading: Debugging Machine Learning
Oct 23: Data is humans: The history of human experimentation
Oct 25: Codes of ethics in medical experimentation
- The Nuremberg code
- IRBs and informed consent.
- The Belmont Report.
- The Declaration of Helsinki
Oct 27: Modern experiments on humans
- The Facebook emotions experiment
- Facebook's election "nudging"
- On the ethics of A/B testing.
- The Menlo Report - The Belmont Report for Information and Communications Technology Research
- Tensions between traditional IRBs and Social Science, Data Science researchers
Nov 13: Auditing black box models
- Random Forests (see Section 10)
- Deep Learning.
- Nov 15: Auditing black-box models (continued)
- Nov 20: Codes of Conduct
Nov 22: Fiduciary Roles
- Medical fiduciaries
- The idea of an information fiduciary
- Nov 27: Data Scientists as Security Consultants
- Nov 29: A Scenario
- Dec 1: No class: Instead, you should all attend this talk on the foundations of data science.
The syllabus page shows a table-oriented view of the course schedule, and the basics of course grading. You can add any other comments, notes, or thoughts you have about the course structure, course policies or anything else.
To add some comments, click the "Edit" link at the top.