Monday, 25 November 2019

Wrong encoding even when specifying encoding to pandas

I have a CSV file that contains accentuated characters. I checked the encoding while opening with PyCharm and Sublime, it's Western: Windows 1252, or ISO-8859-1.

I create a pandas dataframe from this CSV, then modify it, and export it to an UTF-8 text file. I check the exported file with PyCharm and Sublime Text, I don't know why the exported file is not in UTF-8.

Here is my code:

dataset= pd.read_csv("my_file.csv", sep=";", encoding="ISO-8859-1")
print(dataset.loc[0, "my_col"])
>>> "s'il vous plaƮt"

# Export data
with open("out.txt"), "w", newline='') as f:
    dataset.to_csv(path_or_buf=f, sep="\t", header=False, index=False, encoding="utf-8")

When opening "out.txt" with PyCharm, it shows s'il vous pla�t, and PyCharm tells me that the encoding of the file is not UTF-8.



from Wrong encoding even when specifying encoding to pandas

No comments:

Post a Comment