Open Data Assignment

From Knowledge Kitchen
Jump to navigation Jump to search

About NYC Open Data

New York City has an Open Data program. From the horse's mouth:

NYC Open Data makes the wealth of public data generated by various New York City agencies and other City organizations available for public use

Assignment idea

Data massage

Your assignment: take any tabular data series from NYC Open Data and allow a user of your program to do some basic data mining on it.

Note that you are free to massage the data before using it as input to your program.


The following is an example... do not use this data set in your own work.

By converting data from the 2013 Campaign Contributions table into a multidimensional array in code, you could create a program that allows a user to search for any candidate for public office's name and see the people who donated to that candidate, and how much they gave.

Example session (user input in bold; example data is fictitious):

Welcome to the Campaign Contribution Finder App.
This app mines data from
We show you individual contributors to any candidate for public office, and how much they gave.

Please enter a candidate's first and last name: Anthony Weiner

Found 1,233 contributors to Anthony Weiner's campaign.  Showing results 1-10:
   Andrea, Mary,     $2,100
   Asher,  Mein,     $21
   Butter, Kyle,     $299 
   Fisher, Joel,     $36
   Gangi,  Roy,      $1
   Katz,   Harry,    $984
   Lau,    Julia,    $50
   Pagani, Marcos,   $841
   Russo,  Nicholas, $10
   Zelik,  Moshe,    $2
...hit enter to see the next 10...

In particular, the "CANDFIRST", "CANDLAST", "NAME" and "AMNT" columns in this public data set contain the necessary information: the candidate's name, the contributor's name, and the amount given, respectively.

Minimal requirements

Data set

  • Use a data set with at least 100 rows of data.
  • No two students can use the same data - coordinate using all available communication channels.
  • The data must be stored in an external CSV (comma-separated values) format file.
  • You are not allowed to use any CSV parsing libraries/APIs
  • Comments in the code must indicate the URL of the public data used, as well as the specific columns used.

Internal representation of data

  • When your program pulls the data from the text file, convert it into a multidimensional array in code.
  • The length of the array cannot be hard-coded, but must be programmatically sized to fit the data.

Data mining

  • Allow the user to mine this data in at least one direction, as in the example above.
  • Output clear instructions to the user on how to use the program and what to expect in terms of output.

Output results

  • Format all output nicely using the printf method.
  • Paginate the results, displaying no more than 10 results per page, as indicated in the example output.

Extra credit

Support 3 different types of data queries. In the campaign contribution example above, this could be achieved by adding the following additional functionality:

  • allowing the user to search by contributor name in order to see all the candidates the contributor gave money to
  • allowing the user to enter a a minimum and maximum dollar amount, and see all the contributors who gave money within that dollar range and which candidate they gave to.

What links here