Hemant Vishwakarma: Process unicode strings in python

Tuesday, 16 April 2019

I am using fasttext pre-trained model based on english wikipedia. It works as expected...

But when I try the same code with some other language, I get an error as shown on this page...

The error is related to unicode:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 15: invalid start byte

I tried to open the file using Raw Binary option. I changed the function load_words_raw in load.py file:

with open(file_path, 'rb') as f:

And now I get a different error:

ValueError: could not convert string to float: b'\x00l\x02'

I have no idea how to handle this.

Hemant Vishwakarma