Text files in Python

From Knowledge Kitchen
Jump to navigation Jump to search


Files in Python

Opening a text file

Text files can be opened in one of three distinct ways:

Read mode:

#e.g.
f = open("data.csv", "r") #open in read mode

Write mode:

#e.g.
f = open("data.csv", "w") #open in write mode

Append mode:

1 #e.g.
2 f = open("data.csv", "a") #open in append mode

Reading the contents of a text file

Read the entire file in one fell swoop:

#e.g.
theFullMonty = f.read() #read entire file

Read just a single line at a time:

#e.g.
line1 = f.readline() #get the first line
line2 = f.readline() #get the second line
#etc...

Loop through all lines:

#e.g.
for line in f:
    #do something

Checking whether a file already exists

Python's os module contains useful methods for checking whether any given file already exists in the file system.

Character encoding issues

There are several popular encoding schemes for text files: Unicode, ASCII, UTF-8, UTF-16, etc. Each encoding scheme has a different system for converting the characters you type on a keyboard into the numeric codes that are ultimately stored in the computer's memory. In order to read and write to a text file properly from a program, you will need to know which encoding scheme a given text file uses.

The problem

Python's open() function defaults to using whatever the default encoding scheme is on the computer you're using. Often that's not good enough. If you don't specify the correct encoding scheme, your program may crash if it encounters a character that is not found in the encoding scheme it is using.

One solution

In situations where you need to use a specific encoding scheme, Python's codecs module can help. One easy solution is to specify the encoding scheme you would like to use when opening a file. You can do this easily with the codecs module's open() function, rather than the default open() function

import codecs

#the following example opens the file with utf-8 encoding
f = codecs.open("data.csv", mode='r', encoding='utf-8')

Strings in Python

Python has lots of useful String-related functions.

String-related functions that return a String

  • .upper() - returns an uppercase version of the string
  • .lower() - returns a lowercase version of the string
  • .title() - returns a version of the string with the first letter of every word capitalized.
  • .capitalize() - returns a version of the string with the first letter capitalized
  • .strip() - returns a version of the string with any leading whitespace removed
  • .rstrip() - returns a version of the string with any trailing whitespace removed
  • .join(some_list) - returns a string that has all items from the list argument separated using the string as separator

String-related functions that return an Integer

  • .find(x) - returns the index position in the String at which x is found

String-related functions that return a Boolean

  • .isupper() - returns boolean True if the string is uppercase, False otherwise
  • .islower() - returns boolean True if the string is lowercase, False otherwise
  • .isnumeric() - returns a boolean True if the string represents a number, False otherwise

String-related functions that return a List

  • .split(some_delimiter) - returns a list based on the contents of the string, by splitting the string everywhere it finds the delimiter specified as the argument

Lists in Python

Useful functions

List functions that modify an existing List:

  • .append(some_value) - adds the value as a new element at the end of the list

Example programs

Data files

The following programs assume you have two text files in the same folder:

mydata.txt:

The first line in the text file
The second line in the text file

sloppy_data.csv:

1,Peace Food,Manhattan,New York
2,Bareburger,manhattan,new York
3,Why not,manhattan, New york
4,five guys, Manhattan, New York
5,katz DELI,manhattan,new york

Grab entire contents of a text file

Read the entire contents of the file, and print them out.

1 #open a text file in read mode
2 f = open("mydata.txt", "r")
3 
4 #pull all the data out of the file into a variable
5 theTextInTheFile = f.read()
6 
7 #print out the contents of the file
8 print(theTextInTheFile)

Loop through each line of a text file and chop off line breaks

Remove the line break from the end of each file, and then print it out.

 1 #open a text file in read mode
 2 f = open("mydata.txt", "r")
 3 
 4 #loop through each line in the file, one by one
 5 for line in f:
 6 
 7     #remove the line break from the end of the string
 8     line = line.rstrip()
 9 
10     #print out the line
11     print(line)

Loop through all words in a text file

Loop through each word in the file (assuming words are separated by spaces), and print it out.

 1 #open a text file in read mode
 2 f = open("mydata.txt", "r")
 3 
 4 #get the full text from the file
 5 theFullText = f.read()
 6 
 7 #split the text into a List of words by space delimiters
 8 words = theFullText.split()
 9 
10 #loop through each word and analyze it
11 for word in words:
12 
13     #print for debugging
14     print(word)

Count the occurrences of a given word in a text file

Loop through each word in the file (assuming words are separated by spaces), and print out how many times a search term is found. For simplicity, this program does not account for punctuation, which would cause problems in this code.

 1 #open a text file in read mode
 2 f = open("mydata.txt", "r")
 3 
 4 #get the full text from the file
 5 theFullText = f.read()
 6 
 7 #split the text into a List of words by space delimiters
 8 words = theFullText.split()
 9 
10 #what word are we looking for?
11 searchTerm = "Ronkonkoma"
12 
13 #keep a counter of how many times we found the word that we're looking for
14 counter = 0
15 
16 #loop through each word and analyze it
17 for word in words:
18 
19     #check whether the word matches our searchTerm (case-insensitive)
20     if word.lower() == searchTerm.lower():
21 
22         #if so, increment the counter
23         counter = counter + 1
24 
25 #print out the result
26 print("We found the word", searchTerm, counter, "times.")

Fix sloppy capitalization in a CSV data file

Loop through each line in a CSV text file, loop through each value in the line and modify it in some way (in this example, we simply capitalize a word we are searching for). Store the modified values in a two-dimensional list. Loop through this two-dimensional list and output the modified values to a file.


 1 #####################################################
 2 #PART 1 - SCRAPE THE DATA FROM A CSV FILE
 3 #####################################################
 4 
 5 #open a file in read mode
 6 f = open("sloppy_data.csv", "r")
 7 
 8 clean_data = [] #this will store the cleaned up data from all lines
 9 
10 #loop through each line in the file
11 for line in f:
12 
13     new_line = [] #this will store the cleaned up list of values in this one line
14     
15     line = line.strip() #get rid of line break on every line
16     data = line.split(',') #split by commas to get a list of values
17 
18     #loop through every value in this list
19     for thing in data:
20         
21         #if the value is 'ronkonkoma', convert it to uppercase 'RONKONKOMA'
22         if thing == "ronkonkoma":
23             thing = thing.upper()
24 
25         #append this value to the "clean" list of values in this line
26         new_line.append(thing)
27 
28     #append this cleaned up data to the list
29     clean_data.append(new_line)
30 
31 #close the file
32 f.close()
33 
34 
35 #####################################################
36 #PART 2 - WRITE THE CLEANED UP DATA TO A NEW CSV FILE
37 #####################################################
38 
39 #open a file in write mode
40 f = open("sloppy_data_fixed.csv", "w")
41 
42 #loop through each list of cleaned up data... each list represents one line
43 for line_as_list in clean_data:
44 
45     #convert this line (currently stored a list) to a string with comma-separated values
46     line_as_string = ",".join(line_as_list)
47 
48     #write this line's data to a file, along with a line break
49     f.write(line_as_string + "\n")
50 
51 #close the file
52 f.close()

More examples

Data file

These programs assume you have a text file named "data.txt" in the same folder as your Python program. The data.txt file stores student grades as comma separated values (CSV format). This type of format is commonly used by spreadsheet programs like Microsoft Excel.

Example data file

This is our example "data.txt" file in CSV format.

Adam,85 Mark,22 Erica,100 Kaitlin,98 Spencer,69 John,88 Wilson,95 Spencer,49 Faith,89 Andrew,90 Celia,90 Mike,90


Writing to a file

This program allows users to append new grades to the data.txt file.

Note that this program exhibits a bug if the user enters "exit" in response to the first question.

 1 #this program allows us to append student grades to an existing data.txt file
 2 
 3 #flag to indicate whether we opened the file or not
 4 isFileOpen = False
 5 
 6 try:
 7     myFile = open("data.txt", mode="a")
 8     isFileOpen = True
 9 except FileNotFoundError:
10     print("oops, sorry, didn't find the file... my bad")
11 except IOError:
12     print("There was an IO error")
13 except:
14     print("Sorry, I don't know what went wrong")
15   
16 #if the file is open, start writing/appending to it
17 if isFileOpen:
18     studentName = input("Please enter a student name:")
19     studentGrade = input("Please enter this student's grade:")
20 
21     while studentName != "exit" and studentGrade != "exit":        
22         if studentName == "exit":
23             break
24         myFile.write(studentName + "," + studentGrade + "\n")
25         studentName = input("Please enter a student name:")
26         if studentName == "exit":
27             break
28         studentGrade = input("Please enter this student's grade:")
29         if studentGrade == "exit":
30             break
31 
32     myFile.close()

Reading from a file

This program reads the grades from the data.txt file and outputs the average grade for all students found in the file.

 1 #this program opens a file named "data.txt" that holds a student grade in each line in the format <name>,<grade>
 2 #the program outputs the average grade of all students
 3 
 4 #flag to keep track of whether we opened the file or not
 5 isFileOpen = False
 6 
 7 #open up a text file in read-only mode
 8 try:
 9     myFile = open("data.txt", "r")
10     isFileOpen = True #set flag to true now that we've opened the file
11 except FileNotFoundError:
12     print("oops, sorry, didn't find the file... my bad")
13 except IOError:
14     print("There was an IO error")
15 except:
16     print("Sorry, I don't know what went wrong")
17     
18 
19 #if we've opened the file successfully, read from it
20 if isFileOpen:
21     #read file all at once
22     #dataFromFile = myFile.read() #read entire file and store in variable
23     #print(dataFromFile) #print out entire file
24 
25 
26     #keep track of the sum of all grades so we can get the average later
27     runningTotal = 0
28 
29     #keep track of how many students we find in the text file
30     numStudents = 0
31     
32     #read one line from a file
33     line = myFile.readline()
34 
35     #make sure line has something in it
36     while line != "":
37         #increment the student counter
38         numStudents = numStudents + 1
39         
40         #print it out
41         line = line.rstrip("\n") #strip off trailing line break
42         #print(line)
43 
44         #break up the string along the commas
45         data = line.split(",")
46         #print(data)
47         #add the current student's grade to the running total
48         runningTotal = runningTotal + int(data[1])
49         
50 
51         #read another line
52         line = myFile.readline()
53 
54 
55 #calculate the average grade
56 average = runningTotal/numStudents
57 
58 #format it to look nice as a string
59 niceLookingAverage = format(average, ".2f")
60 
61 #print out the average grade
62 print(niceLookingAverage)


What links here