Home
Search
Python File HandlingHandle text files and use pickle. Read in the lines of a text file.
Files. Python programs often open, write to, and append to files. Many useful defs are available in this language—batteries are included.
With the open method, we access files. Methods like readlines() handle their data. With Python, often a loop is not even needed to read a whole file.
Read text lines. We begin with text files. Even if we just want to display all the lines from a file, newlines must be handled. We read all lines from a file with readlines().
Tip We first must create a new file object. And then we loop over the list returned by readlines().
Raw string This program uses the path syntax for a Windows system. We start the string with "r" to avoid errors with backslashes.
Path
End The parameter to print() modifies the behavior of print. When we use end="" the trailing newline is not printed to the console.
Console, print
# Open a file on the disk. f = open(r"C:\perls.txt", "r") # Print all its lines. for line in f.readlines(): # Modify the end argument. print(line, end="")
Line 1 Line 2
File object, loop. We do not need readlines() to access all the lines in a file—we do not even need read() or readline. We can loop over the file object directly.
Tip This example handles empty lines, which contain a newline character by itself.
# Call open() to access the file. f = open(r"C:\programs\info.txt", "r") for line in f: # Empty lines contain a newline character. if line == "\n": print("::EMPTY LINE::") continue # Strip the line. line = line.strip() print(line)
Pets: 1. Dog 2. Cat 3. Bird
Pets: ::EMPTY LINE:: 1. Dog 2. Cat 3. Bird
With. This statement cleans up resources. It makes simpler the task of freeing system resources. It is used with file handling: open() is a common call. It improves readability.
First We use "with" in this simple program. The program opens and reads from a file.
Tip This statement makes sure the system resources are cleaned up properly. The with statement is similar to a try-finally statement.
name = r"C:\perls.txt" # Open the file in a with statement. with open(name) as f: print(f.readline(), end="") # Repeat. with open(name) as f: print(f.readline(), end="")
First line First line
Pickle, list. Often we need to store objects. With pickle, we write collections such as lists to a data file. It supports many objects. The with statement improves resource cleanup.
However In this example, we create a list. We pass this list to pickle.dump().
Dump This writes the list contents in binary form to the file f.pickle. The extension (pickle) has no importance.
Then After we call pickle.dump(), we ignore the original list in memory. We load that same data back from the disk with pickle.load().
import pickle # Input list data. list = ["one", "two", "three"] print("before:", list) # Open the file and call pickle.dump. with open("f.pickle", "wb") as f: pickle.dump(list, f) # Open the file and call pickle.load. with open("f.pickle", "rb") as f: data = pickle.load(f) print("after:", data)
before: ['one', 'two', 'three'] after: ['one', 'two', 'three']
New, empty file. The second argument to open() is a string containing "mode" flag characters. The "w" specifies write-only mode—no appending or reading is done.
Erased If the file happens to exist, it is erased. So be careful when developing programs with this call.
# Create new empty file. # ... If the file exists, it will be cleared of content. f = open("C:\\programs\\test.file", "w")
Write lines. This program writes lines to a file. It first creates an empty file for writing. It specifies the "w" mode to create an empty file. Then it writes two lines.
Tip The line separators (newline chars) are needed. There is no "writeline" method available.
# Create an empty file for writing. with open("C:\\programs\\test.file", "w") as f: # Write two lines to the file. f.write("cat\n") f.write("bird\n")
cat bird
Count character frequencies. This program opens a file and counts each character using a frequency dictionary. It combines open(), readlines, and dictionary's get().
Strip The program strips each line because we do not want to bother with newline characters.
Get The code uses the two-argument form of get. If a value exists, it is returned—otherwise, 0 is returned.
Dictionary
Example text, file.txt, Python:
aaaa bbbbb aaaa bbbbb aaaa bbbbb CCcc xx y y y y y Z
# Open a file. f = open(r"C:\programs\file.txt", "r") # Stores character counts. chars = {} # Loop over file and increment a key for each char. for line in f.readlines(): for c in line.strip(): # Get existing value for this char or a default of zero. # ... Add one and store that. chars[c] = chars.get(c, 0) + 1 # Print character counts. for item in chars.items(): print(item)
('a', 12) (' ', 5) ('C', 2) ('b', 15) ('c', 2) ('y', 5) ('x', 2) ('Z', 1)
IOError. This program causes an IOError to occur. The file "nope.txt" is most likely not present on the computer. The open() method raises an IOError with the "No such file or directory" message.
# An invalid path. name = "/nope.txt" # Attempt to open the file. with open(name) as f: print(f.readline())
Traceback (most recent call last): File "...", line 7, in <module> with open(name) as f: IOError: [Errno 2] No such file or directory: '/nope.txt'
Exists. We can prevent the IOError by first testing the path.exists. This returns true if the file exists, and false otherwise. Here, the method returns false—so open() is never reached.
And No error is ever encountered, because we avoid trying to open the file in the first place.
import os # A file that does not exist. name = "/nope.txt" # See if the path exists. if os.path.exists(name): # Open the file. with open(name) as f: print(f.readline())
Except example. This example uses a try-raise construct to capture errors. When the open() method raises an error, control flow enters the except-block.
And The Python program does not terminate—instead, the error is trapped and handled.
Error
try: # Does not exist. name = "/nope.txt" # Attempt to open it. with open(name) as f: print(f.readline()) except IOError: # Handle the error. print("An error occurred")
An error occurred
Benchmark readlines, read. There is significant overhead in accessing a file for a read. Here we benchmark file usage on a file with about 1000 lines.
Version 1 This version of the code uses the readlines() method and then loops over each line, calling len on each line.
Version 2 Here we call read() on the file, and then access the len of the entire file at once.
Result It was far faster to read the entire file in a single call with the read() method. Using readlines was slower.
File, line repeated 1000 times, test.file:
This is an interesting file. This is an interesting file. ...
import time print(time.time()) # Version 1: use readlines. i = 0 while i < 10000: with open("C:\\programs\\test.file", "r") as f: count = 0 for line in f.readlines(): count += len(line) i += 1 print(time.time()) # Version 2: use read. i = 0 while i < 10000: with open("C:\\programs\\test.file", "r") as f: count = 0 data = f.read() count = len(data) i += 1 print(time.time())
1406148416.003978 1406148423.383404 readlines = 7.38 s 1406148425.989555 read = 2.61 s
Read binary data. A Python program can read binary data from a file. We must add a "b" at the end of the mode argument. We call read() to read the entire file into a bytes object.
bytes
Here A file on the local disk is read. This is a gzip file, which has special bytes at its start.
# Read file in binary form. # ... Specify "b" for binary read and write. f = open(r"C:\stage-perls-cf\file-python", "rb") # Read the entire file. data = f.read() # Print length of result bytes object. # ... Print first three bytes (which are gzip). print(len(data)) print(data[0]) print(data[1]) print(data[2])
42078 31 139 8
Formats. Markup files are often used in computer programs. We handle HTML and XML files. There are many ways to parse or scan these formats—we show HTMLParser and Expat.
CSV files Parsing CSV files is important, but it can be tedious. We introduce the csv module to help make it easier.
Textwrap The textwrap module can be to rewrap text files. This can improve the formatting of files.
Textwrap
A summary. File handling is an important yet error-prone aspect of program development. It is essential. It gives us data persistence.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on 9/5/2022 (simplify).
Home
© 2007-2022 sam allen.
see site info on the changelog.