Wednesday, 14 November 2018

Why opening and iterating over file handle over twice as fast in Python 2 vs Python 3?

I can't work out why it's so much faster to parse this file in Python 2.7 than in Python 3.6. I've found this pattern both on macOS and Arch-Linux independently. Can others replicate it? Any explanation?

Warning: the code snippet writes a ~2GB file

Timings:

$ python2 test.py 
5.01580309868
$ python3 test.py 
10.664075019994925

Code for test.py:

import os

SEQ_LINE = 'ATCGN'* 80 + '\n'

if not os.path.isfile('many_medium.fa'):
    with open('many_medium.fa', 'w') as out_f:
        for i in range(1000000):
            out_f.write('>{}\n'.format(i))
            for _ in range(5):
                out_f.write(SEQ_LINE)

from timeit import timeit

def f():
    with open('many_medium.fa') as f:
        for line in f:
            pass

print(timeit('f()', setup='from __main__ import f', number=5))



from Why opening and iterating over file handle over twice as fast in Python 2 vs Python 3?

No comments:

Post a Comment