PR1 – Preparation for Week 7 – Reading a file

uthor:Carlos Santos
Learning Line:Programming
Course:PR1: Introduction to Programming
Week:6
Competencies:Students will be able to effectively define and use variables, programming flow control.
BoKS:­ 3bK2, The student understands the principles of data related software like Python, R, Scala and/or Java and knows how to write (basic) scripts.
Learning Goals:Able read text files

Text File I/O

Before we start with how to handle files within Python, it’s important to understand what is a file.

At its core, a file is a contiguous set of bits (0 an 1s). This data is organized in a specific format and can be anything as simple as a text file or as complicated as a encoded image or an program executable.

The data stored within files represents and may have different functions within the computer, it depends on the format specification used, which is typically represented by an extension.

For example, a file that has an extension of .gif most likely conforms to the Graphics Interchange Format specification. There are hundreds, if not thousands, of file extensions out there. For this tutorial, you’ll only deal with text files like .txt or .csv file extensions.

Text files are structured as a sequence of characters, ended by a special character EOF (End of File). On most operating systems the file format that allows only plain text content with no formatting. Such files can be viewed and edited on terminals and text editors, usually with additional information indicating an encoding.

Text files have the typical characters like letters and digits, and punctuation, but also multiple special characters, like spaces, tabs and new lines. Interesting to know is that, different operating systems may use different rules, for example in Windows, each line of text separated by a two-character combination: carriage return (CR) and line feed (LF), while in Unix-lile OS, lines are separated by LFs and in MacOS are CR.

File Encoding

In computing, data storage, and data transmission, encoding is used to represent a repertoire of characters for textual data.

Early character codes associate optical representation or a binary value. And they could only represent a small subset of the characters used in written languages, sometimes restricted to Latin based characters, and digits representation and some punctuation only. With the reduction of the cost and evolution of transmissions more elaborate character codes (such as Unicode) may represent most of the characters used in many written languages. Character encoding using internationally accepted standards permits worldwide interchange of text in electronic form.

CharacterUnicode code pointGlyph
Latin AU+0041Α
Latin sharp SU+00DFß
Han for EastU+6771
AmpersandU+0026&
Inverted exclamation markU+00A1¡
Section signU+00A7§

Text Editors

There are multiple text editors, and most of the programming editing interfaces are in their essence text editors, which attached compilers or interpreters.

If you do use one, or do not have one you prefer, I suggest you get familiarized with one, you will be using it a lot, and there are specific features, like encoding, file browsing, syntax highlighting, macros, integration with source control, that can be very useful. Here are some of the most known:

In the same working folder as your script, please create a text file (I named mine testfile.txt), and write something on it, it needs to have at least 2/3 lines.

Read a file

Reading a file is really easy in Python. You request to open a file, you perform operations on it, and then you close it.

Closing a file is no only good coding, it is mandatory.

Closing a file even on read operations it is important, the OS keeps track on which programs are reading and writing on each files, to avoid corruption (reading a section of a file, while another program is writing on it). So, open the file, do all the operation you must in an efficient way and then close it.

Data corruption can occur when two programs are competing for the same file, normally with at least one of them writing on the file. The operating system reduces the risk, but, does not fully eliminate it, more on this topic will be covered in during optimization

f = open('testfile.txt')
print(f.read())
f.close() 
Hello World
Please store in somewhere safe!
This is the last, line! Bye

The second way to close a file is to use the with statement:

with open('testfile.txt') as f:
    print(f.read())

The with statement automatically takes care of closing the file once it leaves the with block, even in cases of error. I highly recommend that you use the with statement as much as possible, as it allows for cleaner code and makes handling any unexpected errors easier for you.

Reading Opened Files

Once you’ve opened up a file, you’ll want to read its content. There are multiple methods that can be called on a file object to help you out:

MethodWhat It Does
.read(size=-1)This reads from the file based on the number of size bytes. If no argument is passed or None or -1 is passed, then the entire file is read.
.readline(size=-1)This reads at most size number of characters from the line. This continues to the end of the line and then wraps back around. If no argument is passed or None or -1 is passed, then the entire line (or rest of the line) is read.
.readlines()This reads the remaining lines from the file object and returns them as a list.

Reading line by line

with open('testfile.txt') as f:
  line = f.readline()
  while line != "":
    # do something with the line
    print(line)
    line = f.readline()
Hello World

Please store in somewhere safe!

This is the last, line! Bye

Count the number of lines in a file

with open('testfile.txt') as f:
    count = len(f.readlines())
print(count)

Show the first character of each line

with open('testfile.txt') as f:
  content = f.readlines()
  for line in content:
        print(line[0])
H
P
T

Show the last word of each line

with open('testfile.txt') as f:
  content = f.readlines()
  for line in content:
    	words = line.split(' ')
        print(words[-1])
World
safe!
Bye