Research Topics in Interactive Data Analysis

The increasing scale and accessibility of digital data – including government records, corporate databases, and logs of online activity – provides an under-exploited resource for improving governance, business, academic research, and our personal lives. For such data to prove broadly useful, people from a variety of backgrounds must be able make sense of it. Facilitating the analysis of large and diverse data sets is a fundamental challenge in both computer systems and human-computer interaction research, and requires the design of new tools for exploring, analyzing and communicating data.

This course will explore how a broad class of data analysts might more effectively work with data through novel interactive tools. The class will be interdisciplinary in nature, with a goal of identifying and pursuing new research opportunities. To this aim, we will touch on diverse topics such as data management (analytic databases, text analysis), user interface techniques (programming-by-demonstration, visualization), and human-centered issues (perceptual, cognitive and social factors).


The Final Project Presentations will be held Monday, June 6, 5:30-7pm in 124 Wallenberg. Snacks and socializing begin at 5pm, presentations will commence promptly at 5:30pm.

Course Structure

The course will consider both research and practice. We will have weekly assigned readings and each student is expected to serve as a discussion leader at least once during the quarter. Students are also expected to complete a research project exploring a novel approach to interactive analysis. In addition to discussing seminal and late-breaking research results, each week will feature a guest lecture from a practitioner at the forefront of "data science".

Lecture Schedule

Week 1: Analytic Thinking (Readings)
  Mar 28 Course Introduction (Slides)
  Mar 30 Guest Lecture: Stuart Card (Stanford, PARC) (Slides)

Week 2: Data Collection (Readings)
  Apr 4 Discussion (Slides)
  Apr 6 Guest Panel: Kuang Chen (UC Berkeley), Christine Robson (IBM Research),
          Selina Tobaccowala & Philip Garland (SurveyMonkey)

Week 3: Data Cleaning & Transformation (Readings)
  Apr 11 Discussion (Slides)
  Apr 13 Guest Lecture: David Huynh (Google)

Week 4: Data Integration (Readings)
  Apr 18 Guest Lecture: Alon Halevy (Google)
  Apr 20 Discussion (Slides)

Week 5: Visual Analysis & Big Data, Part 1 (Readings)
  Apr 25 Guest Lecture: Jock Mackinlay (Tableau)
  Apr 27 Guest Lecture: Jeff Hammerbacher (Cloudera)

Week 6: Visual Analysis & Big Data, Part 2 (Readings)
  May 2 Discussion (Slides)
  May 4 Project Presentations

Week 7: Analysis Practices (Readings)
  May 9 Guest Lecture: Brian Dolan (Discovix)
  May 11 Discussion

Week 8: Social Network Analysis (Readings)
  May 16 Guest Lecture: Jure Leskovec (Stanford)
  May 18 Discussion (Slides)

Week 9: Text Analysis (Readings)
  May 23 Guest Lecture: John Stasko (Georgia Tech)
  May 25 Discussion (Slides)

Week 10: Communication & Collaboration (Readings)
  May 30 Memorial Day Holiday - No Class
  Jun 1 Discussion (Slides)


A0: Course Participation (30%) - Ongoing
A1: A Failure of Analysis (5%) - Due Apr 4, 8am
A2: Analyzing Big Data (15%) - Due Apr 11 (Part 1) & Apr 18 (Part 2)
FP: Final Project (50%) - Ongoing Milestones, Final Submission Due Week of Jun 6

Background Knowledge

The course has no formal prerequisites, but students are expected to be comfortable building user interfaces, using database management systems, and completing significant programming projects. Familiarity with content from any of the following may prove helpful, but is not strictly required: CS448B, CS147/247, CS145/245, CS124/224N, CS224W & CS345.

Required Texts

There are no required books. Instead, we will read papers from multiple sub-disciplines of computer science, available online in PDF format.

Support for Amazon Web Services is provided through the AWS in Education program.
Tableau's data visualization software is provided through the Tableau for Teaching program.