Reading and parsing CSV files in Python

From Knowledge Kitchen
Jump to navigation Jump to search


Python has many ways to read data stored in a comma-separated values (CSV) text file.

Example scenario

All examples in this document assume you have comma-separated list of values in a data file similar to that below. This data represents information about students in a university course.

The goal

Programs in this document show various ways to read this data and perform a simple analysis, including:

  • print the names of all Senior students from Argentina
  • print how many such students were found

The data set

students.csv:

First Name,Last Name,Email,Country,Year
Elga,L'Episcopi,elepiscopi0@mysql.com,Hungary,Junior
Adolph,Cornford,acornford1@usgs.gov,China,Freshman
Edna,Crone,ecrone2@ihg.com,Indonesia,Freshman
Jaquith,Duly,jduly3@disqus.com,Russia,Senior
Elton,Sleep,esleep4@de.vu,Democratic Republic of the Congo,Sophomore
Randi,Bettington,rbettington5@is.gd,United States,Junior
Pris,Rielly,prielly6@economist.com,Indonesia,Freshman
Dwain,Wittey,dwittey7@sourceforge.net,Argentina,Senior
Jeno,O'Crevan,jocrevan8@dagondesign.com,Bosnia and Herzegovina,Sophomore
Malvin,Schuck,mschuck9@histats.com,South Korea,Sophomore
Ichabod,Braunthal,ibraunthala@senate.gov,Thailand,Sophomore
Erich,Matzl,ematzlb@princeton.edu,Canada,Junior
Merlina,Dibble,mdibblec@lycos.com,China,Sophomore
Heather,Aldersley,haldersleyd@smugmug.com,Philippines,Senior
Stephi,Moseby,smosebye@issuu.com,Poland,Sophomore
Orville,Labden,olabdenf@simplemachines.org,Micronesia,Junior
Matthew,Fitzer,mfitzerg@vinaora.com,China,Sophomore
Ricki,Kinnoch,rkinnochh@ifeng.com,Indonesia,Junior
Kylie,Marriott,kmarriotti@cloudflare.com,Philippines,Senior
Verge,Quadling,vquadlingj@dell.com,Finland,Sophomore
Junia,Cheek,jcheekk@apache.org,Philippines,Senior
Kahlil,Olczyk,kolczykl@china.com.cn,Canada,Senior
Dulcea,Dwane,ddwanem@tripod.com,Russia,Freshman
Sharon,De Courtney,sdecourtneyn@canalblog.com,Colombia,Senior
Eleonore,Linsay,elinsayo@zimbio.com,China,Sophomore
Aubrey,Glastonbury,aglastonburyp@nifty.com,China,Junior
Leah,Fishleigh,lfishleighq@answers.com,Philippines,Senior
Feodor,Lawie,flawier@admin.ch,Azerbaijan,Freshman
Coletta,Fantham,cfanthams@sbwire.com,Sweden,Senior
Wenona,Sandeford,wsandefordt@ucsd.edu,Indonesia,Sophomore
Petrina,Djurisic,pdjurisicu@mozilla.com,Japan,Junior
Verile,Barriball,vbarriballv@omniture.com,Indonesia,Junior
Sky,Brigshaw,sbrigshaww@1und1.de,China,Senior
Patti,Peracco,pperaccox@amazon.com,Ethiopia,Senior
Sergio,Wenn,swenny@scribd.com,United States,Senior
Catarina,Langhorn,clanghornz@shinystat.com,Czech Republic,Senior
Salem,Mounsey,smounsey10@mysql.com,Russia,Senior
Burty,Hebblethwaite,bhebblethwaite11@rakuten.co.jp,Russia,Sophomore
Sarge,Crookes,scrookes12@sakura.ne.jp,France,Freshman
Dee,Parrington,dparrington13@engadget.com,China,Sophomore
Galvan,Blasius,gblasius14@geocities.jp,Bosnia and Herzegovina,Sophomore
Marcy,Birtonshaw,mbirtonshaw15@deliciousdays.com,Indonesia,Junior
Lenci,Tubbs,ltubbs16@uiuc.edu,Slovenia,Junior
Cahra,Zottoli,czottoli17@ebay.com,Vietnam,Freshman
Brewer,Petrovic,bpetrovic18@addtoany.com,China,Senior
Rockwell,Danieli,rdanieli19@purevolume.com,Belarus,Freshman
Allene,MacSkeaghan,amacskeaghan1a@nih.gov,Indonesia,Freshman
Peter,Dowglass,pdowglass1b@google.co.uk,Russia,Junior
Meryl,Keefe,mkeefe1c@berkeley.edu,China,Sophomore
Rafael,Legerton,rlegerton1d@go.com,Mali,Junior

Roll your own

If you didn't know any better, you could read and analyze data from CSV files manually using basic Python:

 1 #open the csv file in read mode
 2 csvfile = open("students.csv", "r")
 3 
 4 #skip the first line, since it contains the column headings
 5 csvfile.readline()
 6 
 7 # nice heading for the output
 8 print(":::::Seniors from Argentina:::::")
 9 
10 #counter of Seniors from New York
11 argentina_senior_counter = 0
12 
13 # loop through each line in the file
14 for line in csvfile:
15     
16     #split each line into a list, according to the commas
17     row = line.split(",")
18     
19     #if any row contains a Senior from Argentina, print out the name
20     #each row is a list
21     if row[3] == "Argentina" and row[4] == "Senior":
22 
23         #increment the counter of seniors from Argentina
24         ny_senior_counter += 1
25 
26         #print out the name
27         print(row[0])
28 
29         
30 #output how many we found
31 print("Found", argentina_senior_counter, "seniors from Argentina.")

CSV to Lists

Python interpreters generally come with a standard library for dealing with files in CSV format. This library can easily read csv data into Lists, which saves you a few steps:

 1 #import python's built-in csv library
 2 import csv
 3 
 4 #open the csv file in read mode
 5 csvfile = open('students.csv', 'r')
 6 
 7 #create a csv reader with standard settings for a csv
 8 reader = csv.reader(csvfile, delimiter=',', quotechar='"')
 9 
10 # nice heading for the output
11 print(":::::Seniors from Argentina:::::")
12 
13 #row counter
14 row_counter = 0
15 
16 #counter of Seniors from Argentina
17 argentina_senior_counter = 0
18 
19 #loop through each row in the csv file... each row comes as a list
20 for row in reader:
21 
22     #increment the counter for each row we iterate through
23     row_counter += 1
24     
25     #skip the first row, since it contains the column headers
26     if row_counter == 1:
27         continue
28 
29     #if any row contains a Senior from Argentina, print out the name
30     #each row is a list
31     if row[1] == "Argentina" and row[2] == "Senior":
32 
33         #increment the counter of seniors from Argentina
34         argentina_senior_counter += 1
35 
36         #print out the name of this student
37         print(row[0])
38     
39 #output how many we found
40 print("Found", argentina_senior_counter, "seniors from Argentina.")

CSV to Dictionaries

Alternatively, and perhaps preferably, you can use Python's csv library to read all CSV file data into Dictionaries, which are more intuitive to humans:

 1 #import python's built-in csv library
 2 import csv
 3 
 4 # open the file in read mode
 5 csvfile = open('students.csv', 'r')
 6 
 7 # convert the data in this file into a DictReader
 8 reader = csv.DictReader(csvfile)
 9 
10 # nice heading for the output
11 print(":::::Seniors from Argentina:::::")
12 
13 #counter of Seniors from Argentina
14 argentina_senior_counter = 0
15 
16 # loop through each row in the csv file... each row comes as a  dictionary
17 # the column names in the first line of the file are magically used  as the keys in the dictionary for each row.
18 for row in reader:
19     
20     #if any row contains a Senior from Argentina, print out the name
21     #each row is a dictionary    
22     if row["Origin"] == "Argentina" and row["Year"] == "Senior":
23         
24         #increment the counter of seniors from Argentina
25         argentina_senior_counter += 1
26         
27         #print out the name of this student
28         print(row["Name"])
29         
30 
31 #output how many we found
32 print("Found", argentina_senior_counter, "seniors from Argentina.")

CSV to pandas DataFrame

Why reinvent the wheel? Pandas, the data analysis library, contains a powerfully simple CSV functionality. This library can read CSV data into pandas' unique DataFrame data structure, which has similar functionality to a spreadsheet or database table, and can drastically reduce the amount of custom code you need to write.

 1 #import the pandas data analytics library
 2 import pandas as pd
 3 
 4 # read the csv data into a pandas DataFrame data structure
 5 df = pd.read_csv("data.csv", sep=',')
 6 
 7 #create a filter - returns a pandas Series data structure containing True and False values  for each row indicating whether they match
 8 seniors_from_argentina = (df["Year"] == "Senior") & (df["Country"] == "Argentina")
 9 
10 #apply the filter to all the data in the "First Name" Series
11 #this returns a Series with only the matches listed
12 matches = df["First Name"][seniors_from_argentina]
13 
14 # nice heading for the output
15 print(":::::Seniors from Argentina:::::")
16 
17 #print out the names of the matching students
18 print(matches)
19 
20 #output how many we found
21 print("Found", matches.count(), "seniors from Argentina.")


What links here