Monday, 25 January 2021

splitlines() and iterating over an opened file give different results

I have files with sometimes weird end-of-lines characters like \r\r\n. With this, it works like I want:

with open('test.txt', 'wb') as f:
    f.write(b'abc\r\r\ndef')
with open('test.txt', 'rb') as f:
    for l in f:
        print(l)
# b'abc\r\r\n'         
# b'def'

I want to able to get the same result from a string. I thought about splitlines but it does not give the same result:

print(b'abc\r\r\ndef'.splitlines())
# [b'abc', b'', b'def']

Even with keepends=True, it's not the same result.

Question: how to have the same behaviour than for l in f with splitlines()?

Linked: Changing str.splitlines to match file readlines and https://bugs.python.org/issue22232

Note: I don't want to put everything in a BytesIO or StringIO, because it does a x0.5 speed performance (already benchmarked); I want to keep a simple string. So it's not a duplicate of How do I wrap a string in a file in Python?.



from splitlines() and iterating over an opened file give different results

No comments:

Post a Comment