Friday, 15 October 2021

How to speed up pandas dataframe iteration involving 2 different dataframes with a complex condition?

I have a pandas dataframe A of approximately 300000 rows. Each row has a latitude and longitude value.

I also have a second pandas dataframe B of about 10000 rows, which has an ID number, a maximum and minimum latitude, and a maximum and minimum longitude.

For each row in A, I need the ID of the corresponding row in B, such that the latitude and longitude of the row in A is contained within the bounding box represented by the row in B.

So far I have the following:

ID_list = []

for index, row in A.iterrows():
    filtered_B = B.apply(lambda x : x['ID'] if row['latitude'] >= x['min_latitude']
                                            and row['latitude'] < x['max_latitude'] \
                                            and row['longitude'] >= x['min_longitude'] \
                                            and row['longitude'] < x['max_longitude'] \
                                            else None, axis = 1)
    ID_list.append(B.loc[filtered_B == True]['ID']

The ID_list variable was created with the intention of adding it as an ID column to A. The greater than or equal to and less than conditions are included so that each row in A has only one ID from B.

The above code technically works, but it completes about 1000 rows per minute, which is just not feasible for such a large dataset.

Any tips would be appreciated, thank you.

edit: sample dataframes:

A:

location latitude longitude
1 -33.81263 151.23691
2 -33.994823 151.161274
3 -33.320154 151.662009
4 -33.99019 151.1567332

B:

ID min_latitude max_latitude min_longitude max_longitude
9ae8704 -33.815 -33.810 151.234 151.237
2ju1423 -33.555 -33.543 151.948 151.957
3ef4522 -33.321 -33.320 151.655 151.668
0uh0478 -33.996 -33.990 151.152 151.182

expected output:

ID_list = [9ae8704, 0uh0478, 3ef4522, 0uh0478]


from How to speed up pandas dataframe iteration involving 2 different dataframes with a complex condition?

No comments:

Post a Comment