Datasets and Code

This section contains datasets and other code needed to complete the lab exercises in the Lectures and Labs section. The files can also be found in this GitHub repository.

Datasets

Original location (as described in the lab handouts): dataiap/datasets

DATASETS COMMENTS
2008 Presidential Campaign Contributions (ZIP) (This ZIP file contains: 1 .xls file.) In the public domain, from the Federal Election Commission.
2011 County Health Rankings (ZIP) (This ZIP file contains: 3 .xls file, 1 .txt file, and 3 .py files.) 2011 County Health Ranking National Data.xls © County Health Rankings & Roadmaps, ols.py © Vincent Nijs. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/fairuse.

Enron email dataset (TGZ - 432MB)

Subsets of Enron dataset:

kenneth.zip (ZIP) (This ZIP file contains: 4166 .txt files.)

kenneth_json.zip (ZIP) (This ZIP file contains: 1 .json file.)

In the public domain, from the Federal Energy Regulatory Commission. The history of the Enron dataset is described here.

Code

Original locations (as described in the lab handouts):

  • dataiap/dayX
  • dataiap/resources

The zip files for days 3 and 5 include hypothesis_testing.py, regression.py, and mapreduce.py. These are the source files for the lab handouts, and are included here for convenience; the .py files do not provide additional content.

CODE FILES COMMENTS
Day 3 (ZIP) (This ZIP file contains: 4 .py files.) ols.py © Vincent Nijs, welchttest.py © Angus McMorland. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/fairuse.
Day 4 (ZIP) (This ZIP file contains: 1 .py file.)  
Day 5 (ZIP) (This ZIP file contains: 7 .py files and 1 .gz file.)  
Resources (ZIP) (This ZIP file contains: 6 .py files and 2 .json files.)