I can't work out why it's so much faster to parse this file in Python 2.7 than in Python 3.6. I've found this pattern both on macOS and Arch-Linux independently. Can others replicate it? Any explanation?
Warning: the code snippet writes a ~2GB file
Timings:
$ python2 test.py
5.01580309868
$ python3 test.py
10.664075019994925
Code for test.py:
import os
SEQ_LINE = 'ATCGN'* 80 + '\n'
if not os.path.isfile('many_medium.fa'):
with open('many_medium.fa', 'w') as out_f:
for i in range(1000000):
out_f.write('>{}\n'.format(i))
for _ in range(5):
out_f.write(SEQ_LINE)
from timeit import timeit
def f():
with open('many_medium.fa') as f:
for line in f:
pass
print(timeit('f()', setup='from __main__ import f', number=5))
from Why opening and iterating over file handle over twice as fast in Python 2 vs Python 3?
No comments:
Post a Comment