Thursday 12 September 2019

Trouble with df.join(): ValueError: You are trying to merge on object and int64 columns

None of these questions adress the issue: Question 1 and Question 2 nor could I find the answer in pandas documentation.

Hello, I am trying to find the underlying cause for this error:

ValueError: You are trying to merge on object and int64 columns.

I know I can work around this problem using pandas concat or merge function, but I am trying to understand the cause for the error. The question is: Why do I get this ValueError?

Here's the output of the head(5) and info() on both dataframes that are used.

print(the_big_df.head(5)) Output:

  account  apt  apt_p  balance       date  day    flag  month  reps     reqid  year
0  AA0420    0    0.0  -578.30 2019-03-01    1       1      3    10  82f2d761  2019
1  AA0420    0    0.1  -578.30 2019-03-02    2       1      3    10  82f2d761  2019
2  AA0420    0    0.1  -578.30 2019-03-03    3       1      3    10  82f2d761  2019
3  AA0421    0    0.1  -607.30 2019-03-04    4       1      3    10  82f2d761  2019
4  AA0421    0    0.1  -610.21 2019-03-05    5       1      3    10  82f2d761  2019

print(the_big_df.info()) Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36054 entries, 0 to 36053
Data columns (total 11 columns):
account        36054 non-null object
apt            36054 non-null int64
apt_p          36054 non-null float64
balance        36054 non-null float64
date           36054 non-null datetime64[ns]
day            36054 non-null int64
flag           36054 non-null int64
month          36054 non-null int64
reps           36054 non-null int32
reqid          36054 non-null object
year           36054 non-null int64
dtypes: datetime64[ns](1), float64(2), int32(1), int64(5), object(2)
memory usage: 3.2+ MB

Here's the dataframe I'm passing to the join(); print(df_to_join.head(5)):

      reqid     id
0  54580f39  13301
1  3ba905c0  77114
2  5f2d80da  13302
3  a1478e98  77115
4  9b09854b  78598

print(df_to_join.info()) Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14332 entries, 0 to 14331
Data columns (total 2 columns):
reqid    14332 non-null object
dni      14332 non-null object

The exact next line after the 4 prints stated above is:

the_max_df = the_big_df.join(df_to_join,on='reqid')

And the output is, as stated above:

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

Why does this happen, when before it is clearly stated that column reqid is an object in both dataframes? Thanks.



from Trouble with df.join(): ValueError: You are trying to merge on object and int64 columns

No comments:

Post a Comment