How come the following works without any errors in Python?
>>> '你好'.encode('UTF-8').decode('ISO8859-1')
'ä½\xa0好'
>>> _.encode('ISO8859-1').decode('UTF-8')
'你好'
I would have expected it to fail with a UnicodeEncodeError or UnicodeDecodeError
Is there some property of ISO8859-1 and UTF-8 such that I can take any UTF-8 encoded string and decode it to a ISO8859-1 string, which can later be reversed to get the original UTF-8 string?
I'm working with an older database that only supports the ISO8859-1 character set. It seems like the developers were able to store Chinese and other languages in this database by decoding UTF-8 encoded strings into ISO8859-1, and storing the resulting garbage string in the database. Downstream systems which query this database then have to encode the garbage string in ISO8859-1 and then decode the result with UTF-8 to get the correct string.
I would have assumed that such a process would not work at all.
What am I missing?
from How come I can decode a UTF-8 byte string to ISO8859-1 and back again without any UnicodeEncodeError/UnicodeDecodeError?
No comments:
Post a Comment