Wednesday, 25 August 2021

Efficient way to sort and concatenate the dataframes

I have following two dataframes:

>>> df1
  c1  c2  v1  v2
0  A NaN   9   2
1  B NaN   2   5
2  C NaN   3   5
3  D NaN   4   2

>>> df2
   c1 c2  v1  v2
0   A  P   4   1
1   A  T   3   1
2   A  Y   2   0
3   B  P   0   1
4   B  T   2   2
5   B  Y   0   2
6   C  P   1   2
7   C  T   1   2
8   C  Y   1   1
9   D  P   1   1
10  D  T   2   0
11  D  Y   1   1

I need to concatenate the dataframes and sort them or vice versa. The first dataframe needs to be sorted on v1 column, then the second dataframe needs to be sorted based on the order of the values from c1 column after sorting the first dataframe, and the v2 column from the second dataframe.

A working version is something like this: sorting first dataframe on v1, then iterating the rows, and filtering the second dataframe for the value of c2 column, and sorting the filtered second dataframe on v2, finally concatenating all the frames.

result = []
for i,row in df1.sort_values('v1').iterrows():
    result.append(row.to_frame().T)
    result.append(df2[df2['c1'].eq(row['c1'])].sort_values('v2'))

The resulting dataframe after sorting:

>>> pd.concat(result, ignore_index=True)
   c1   c2 v1 v2
0   B  NaN  2  5
1   B    P  0  1
2   B    T  2  2
3   B    Y  0  2
4   C  NaN  3  5
5   C    Y  1  1
6   C    P  1  2
7   C    T  1  2
8   D  NaN  4  2
9   D    T  2  0
10  D    P  1  1
11  D    Y  1  1
12  A  NaN  9  2
13  A    Y  2  0
14  A    P  4  1
15  A    T  3  1

The problem with above approach is its iterative, and not so efficient when the number of dataframes increases and/or the number of rows increases in these dataframes. The real use-case scenario has from 2 to 6 dataframes, where number of rows ranges from few thousands to hundred thousands.

UPDATE:

Either of sorting the dataframes first then concatenating them, or concatenating the datframes first then sorting, will be fine, that is why I just included both the dataframes instead of just concatenating them and presenting a single dataframe.



from Efficient way to sort and concatenate the dataframes

No comments:

Post a Comment