Michał Oręziak
micorix blog

micorix blog

3 Python tricks for reading files

3 Python tricks for reading files

useful for the MATURA EXAM

Michał Oręziak's photo
Michał Oręziak
·Oct 24, 2021·

4 min read

Subscribe to my newsletter and never miss my upcoming articles

Hi there! I consider myself more front-end developer than a back-end one and I'm not gonna lie - I didn't like Python at first. Some shortcuts seemed unintuitive. At school, we were taught Java, but neither Java nor C++ were in my opinion best fit for MATURA exam tasks. That's why I decided to stick with Python.

In this article, I'm gonna describe a few tricks for reading files. They may seem basic but they were really helpful for me at the time. I'm not a pro, so any comments on how to improve described solutions are more than welcome!

To illustrate our examples, we will be working with a text file containing binary numbers (one number per line).

10010100010100
101011101101011
1000111101010
...

1. enumerate function

Say we want to have access to the line number, for instance just for sake of displaying it. In that case, we can use enumerate function which provides us also with the index of an iterable (in that case a list which is returned from f.readlines()).

Before:

f = open(FILENAME)
i = 0
for line in f.readlines():
  print(f"Line number: {i}. Line: {line.strip()}") # Line number: 0. Line: 10010100010100
  i += 1
f.close()

After:

f = open(FILENAME)
for i, line in enumerate(f.readlines()):
  print(f"Line number: {i}. Line: {line.strip()}")
f.close()

If you wonder what is that strange f before a string, let me tell you. It's an f-string. Easy right? You can use it to format your string with variables or statements.


2. with statement

You might notice that in previous examples I used f.close() which is not something you usually use when you are trying to finish an exam sheet in time. Calling f.close() it's a good practice, cause it tells your program that you are no longer using a certain file, so your program can perform some cleanup. It is useful when dealing with multiple huge files. So while it is not necessary, it's still better to use it. The good news is, you can use the with statement and I'll close the file for you!

Before:

f = open(FILENAME)
for line in f.readlines():
  print(line.strip())
f.close()

After:

with open(FILENAME) as f:
  for line in f.readlines():
    print(line.strip())

3. Generators

What the heck is it, you may ask. Generators are a super-fun way to access huge loads of data. Imagine that you have reeaallyy huge text file. It would take a lot of time to load the content to list and process the list (like in f.readlines()). Instead, we can use a generator which allows us to process the data one chunk at a time. A generator is a device that says - "I'll give you that small chunk of data. Work on it. If you need the next chunk, tell me. I'll prepare one for you, and you'll work on the next one." Not only is this the only use case of generators. You can also use them to prepare additional data needed for iterations.

Consider the following example:

def get_bin_numbers():
 l = []
 with open(FILENAME) as f:
   for line in f.readlines():
     l.append((line.count('1'), line))
   return l

for ones_count, line in get_bin_numbers():
 print(f"Doing some things with {ones_count} ones in {line}")

After:

def bin_numbers():
 with open(FILENAME) as f:
   for line in f:
     yield line.count("1"), line

for ones_count, line in bin_numbers():
 print(f"Doing some things with {ones_count} ones in {line}")

What bin_numbers function returns and what's with the yield thing? get_bin_numbers function returned a list, here the function returns the generator object. How is that different? Remember what did I say about the primary idea behind generators? That's what yield is for. yield "stops function execution" and tells the function that this is a chunk we're going to process right now.

Caveats

  1. By using the described approach we open the file every time when new generator object is created. It affects performance. However, during the MATURA it may not be that important given the execution time is in most cases not a subject for evaluation. If you want a more performant solution, you can just create a list of data beforehand and use it in the next exercises. For me though, the use of generators simplifies the code and I can accept that my program would take a bit longer to execute.
  2. Generators' use-cases are not limited to reading files. You can dig online and you'll find some magic.

Here are some links with a bit more advanced stuff for further reading:

BTW: as you might notice Towards Data Science has great articles on Python. Check them out.


Thanks for reading! If you have any questions or suggestions, please drop them below or DM me on Twitter!

 
Share this