Friday, 13 October 2023

Why does an operation on a large integer silently overflow?

I have a list that contains very large integers and I want to cast it into a pandas column with a specific dtype. As an example, if the list contains 2**31, which is outside the limit of int32 dtype, casting it into dtype int32 throws an Overflow Error, which lets me know to use another dtype or handle the number in some other way beforehand.

import pandas as pd
pd.Series([2**31], dtype='int32')

# OverflowError: Python int too large to convert to C long

But if a number is large but inside the dtype limits (i.e. 2**31-1), and some number is added to it which results in a value that is outside the dtype limits, then instead of an OverflowError, the operation is executed without any errors, yet the value is now inverted, becoming a completely wrong number for the column.

pd.Series([2**31-1], dtype='int32') + 1

0   -2147483648
dtype: int32

Why is it happening? Why doesn’t it raise an error like the first case?

PS. I'm using pandas 2.1.1 and numpy 1.26.0 on Python 3.12.0.



from Why does an operation on a large integer silently overflow?

No comments:

Post a Comment