knowledge-kitchen / course-notes

Text Files - Basic Operations (in Python)

Files in Python

Opening a text file

Text files can be opened in one of three distinct ways:

Read mode:

#e.g.
f = open("data.csv", "r") #open in read mode

Write mode:

#e.g.
f = open("data.csv", "w") #open in write mode

Append mode:

#e.g.
f = open("data.csv", "a") #open in append mode

Reading the contents of a text file

Read the entire file in one fell swoop:

#e.g.
theFullMonty = f.read() #read entire file

Read just a single line at a time:

#e.g.
line1 = f.readline() #get the first line
line2 = f.readline() #get the second line
#etc...

Loop through all lines:

#e.g.
for line in f:
    #do something

Checking whether a file already exists

Python's os module contains useful methods for checking whether any given file already exists in the file system.

Character encoding issues

There are several popular encoding schemes for text files: ASCII, Unicode, UTF-8, UTF-16, etc. Each encoding scheme has a different system for converting the characters you type on a keyboard into the numeric codes that are ultimately stored in the computer's memory. In order to read and write to a text file properly from a program, you will need to know which encoding scheme a given text file uses.

The problem

Python's open() function defaults to using whatever the default encoding scheme is on the computer you're using. Often that's not good enough. If you don't specify the correct encoding scheme, your program may crash if it encounters a character that is not found in the encoding scheme it is using.

One solution

In situations where you need to use a specific encoding scheme, Python's codecs module can help. One easy solution is to specify the encoding scheme you would like to use when opening a file. You can do this easily with the codecs module's open() function, rather than the default open() function

import codecs

#the following example opens the file with utf-8 encoding
f = codecs.open("data.csv", mode='r', encoding='utf-8')

Strings in Python

Python has lots of useful String-related functions.

Lists in Python

Useful functions

List functions that modify an existing List:

Example programs

Data files

The following programs assume you have two text files in the same folder:

mydata.txt:

The first line in the text file
The second line in the text file

sloppy_data.csv:

1,Peace Food,Manhattan,New York
2,Bareburger,manhattan,new York
3,Why not,manhattan, New york
4,five guys, Manhattan, New York
5,katz DELI,manhattan,new york

Grab entire contents of a text file

Read the entire contents of the file, and print them out.

#open a text file in read mode
f = open("mydata.txt", "r")

#pull all the data out of the file into a variable
theTextInTheFile = f.read()

#print out the contents of the file
print(theTextInTheFile)

Loop through each line of a text file and chop off line breaks

Remove the line break from the end of each file, and then print it out.

#open a text file in read mode
f = open("mydata.txt", "r")

#loop through each line in the file, one by one
for line in f:

    #remove the line break from the end of the string
    line = line.rstrip()

    #print out the line
    print(line)

Loop through all words in a text file

Loop through each word in the file (assuming words are separated by spaces), and print it out.

#open a text file in read mode
f = open("mydata.txt", "r")

#get the full text from the file
theFullText = f.read()

#split the text into a List of words by space delimiters
words = theFullText.split()

#loop through each word and analyze it
for word in words:

    #print for debugging
    print(word)

Count the occurrences of a given word in a text file

Loop through each word in the file (assuming words are separated by spaces), and print out how many times a search term is found. For simplicity, this program does not account for punctuation, which would cause problems in this code.

#open a text file in read mode
f = open("mydata.txt", "r")

#get the full text from the file
theFullText = f.read()

#split the text into a List of words by space delimiters
words = theFullText.split()

#what word are we looking for?
searchTerm = "Ronkonkoma"

#keep a counter of how many times we found the word that we're looking for
counter = 0

#loop through each word and analyze it
for word in words:

    #check whether the word matches our searchTerm (case-insensitive)
    if word.lower() == searchTerm.lower():

        #if so, increment the counter
        counter = counter + 1

#print out the result
print("We found the word", searchTerm, counter, "times.")

Fix sloppy capitalization in a CSV data file

Loop through each line in a CSV text file, loop through each value in the line and modify it in some way (in this example, we simply capitalize a word we are searching for). Store the modified values in a two-dimensional list. Loop through this two-dimensional list and output the modified values to a file.

#####################################################
#PART 1 - SCRAPE THE DATA FROM A CSV FILE
#####################################################

#open a file in read mode
f = open("sloppy_data.csv", "r")

clean_data = [] #this will store the cleaned up data from all lines

#loop through each line in the file
for line in f:

    new_line = [] #this will store the cleaned up list of values in this one line

    line = line.strip() #get rid of line break on every line
    data = line.split(',') #split by commas to get a list of values

    #loop through every value in this list
    for thing in data:

        #if the value is 'ronkonkoma', convert it to uppercase 'RONKONKOMA'
        if thing == "ronkonkoma":
            thing = thing.upper()

        #append this value to the "clean" list of values in this line
        new_line.append(thing)

    #append this cleaned up data to the list
    clean_data.append(new_line)

#close the file
f.close()


#####################################################
#PART 2 - WRITE THE CLEANED UP DATA TO A NEW CSV FILE
#####################################################

#open a file in write mode
f = open("sloppy_data_fixed.csv", "w")

#loop through each list of cleaned up data... each list represents one line
for line_as_list in clean_data:

    #convert this line (currently stored a list) to a string with comma-separated values
    line_as_string = ",".join(line_as_list)

    #write this line's data to a file, along with a line break
    f.write(line_as_string + "\n")

#close the file
f.close()

More examples

Data file

These programs assume you have a text file named “data.txt” in the same folder as your Python program. The data.txt file stores student grades as comma separated values (CSV format). This type of format is commonly used by spreadsheet programs like Microsoft Excel.

Example data file

This is our example "data.txt" file in CSV format.

Adam,85 Mark,22 Erica,100 Kaitlin,98 Spencer,69 John,88 Wilson,95
Spencer,49 Faith,89 Andrew,90 Celia,90 Mike,90

Writing to a file

This program allows users to append new grades to the data.txt file.

Note that this program exhibits a bug if the user enters "exit" in response to the first question.

#this program allows us to append student grades to an existing data.txt file

#flag to indicate whether we opened the file or not
isFileOpen = False

try:
    myFile = open("data.txt", mode="a")
    isFileOpen = True
except FileNotFoundError:
    print("oops, sorry, didn't find the file... my bad")
except IOError:
    print("There was an IO error")
except:
    print("Sorry, I don't know what went wrong")

#if the file is open, start writing/appending to it
if isFileOpen:
    studentName = input("Please enter a student name:")
    studentGrade = input("Please enter this student's grade:")

    while studentName != "exit" and studentGrade != "exit":
        if studentName == "exit":
            break
        myFile.write(studentName + "," + studentGrade + "\n")
        studentName = input("Please enter a student name:")
        if studentName == "exit":
            break
        studentGrade = input("Please enter this student's grade:")
        if studentGrade == "exit":
            break

    myFile.close()

Reading from a file

This program reads the grades from the data.txt file and outputs the average grade for all students found in the file.

#this program opens a file named "data.txt" that holds a student grade in each line in the format <name>,<grade>
#the program outputs the average grade of all students

#flag to keep track of whether we opened the file or not
isFileOpen = False

#open up a text file in read-only mode
try:
    myFile = open("data.txt", "r")
    isFileOpen = True #set flag to true now that we've opened the file
except FileNotFoundError:
    print("oops, sorry, didn't find the file... my bad")
except IOError:
    print("There was an IO error")
except:
    print("Sorry, I don't know what went wrong")


#if we've opened the file successfully, read from it
if isFileOpen:
    #read file all at once
    #dataFromFile = myFile.read() #read entire file and store in variable
    #print(dataFromFile) #print out entire file


    #keep track of the sum of all grades so we can get the average later
    runningTotal = 0

    #keep track of how many students we find in the text file
    numStudents = 0

    #read one line from a file
    line = myFile.readline()

    #make sure line has something in it
    while line != "":
        #increment the student counter
        numStudents = numStudents + 1

        #print it out
        line = line.rstrip("\n") #strip off trailing line break
        #print(line)

        #break up the string along the commas
        data = line.split(",")
        #print(data)
        #add the current student's grade to the running total
        runningTotal = runningTotal + int(data[1])


        #read another line
        line = myFile.readline()


#calculate the average grade
average = runningTotal/numStudents

#format it to look nice as a string
niceLookingAverage = format(average, ".2f")

#print out the average grade
print(niceLookingAverage)