I have following two dataframes:
>>> df1
c1 c2 v1 v2
0 A NaN 9 2
1 B NaN 2 5
2 C NaN 3 5
3 D NaN 4 2
>>> df2
c1 c2 v1 v2
0 A P 4 1
1 A T 3 1
2 A Y 2 0
3 B P 0 1
4 B T 2 2
5 B Y 0 2
6 C P 1 2
7 C T 1 2
8 C Y 1 1
9 D P 1 1
10 D T 2 0
11 D Y 1 1
I need to concatenate the dataframes and sort them or vice versa. The first dataframe needs to be sorted on v1
column, then the second dataframe needs to be sorted based on the order of the values from c1
column after sorting the first dataframe, and the v2
column from the second dataframe.
A working version is something like this: sorting first dataframe on v1
, then iterating the rows, and filtering the second dataframe for the value of c2
column, and sorting the filtered second dataframe on v2
, finally concatenating all the frames.
result = []
for i,row in df1.sort_values('v1').iterrows():
result.append(row.to_frame().T)
result.append(df2[df2['c1'].eq(row['c1'])].sort_values('v2'))
The resulting dataframe after sorting:
>>> pd.concat(result, ignore_index=True)
c1 c2 v1 v2
0 B NaN 2 5
1 B P 0 1
2 B T 2 2
3 B Y 0 2
4 C NaN 3 5
5 C Y 1 1
6 C P 1 2
7 C T 1 2
8 D NaN 4 2
9 D T 2 0
10 D P 1 1
11 D Y 1 1
12 A NaN 9 2
13 A Y 2 0
14 A P 4 1
15 A T 3 1
The problem with above approach is its iterative, and not so efficient when the number of dataframes increases and/or the number of rows increases in these dataframes. The real use-case scenario has from 2 to 6 dataframes, where number of rows ranges from few thousands to hundred thousands.
UPDATE:
Either of sorting the dataframes first then concatenating them, or concatenating the datframes first then sorting, will be fine, that is why I just included both the dataframes instead of just concatenating them and presenting a single dataframe.
from Efficient way to sort and concatenate the dataframes
No comments:
Post a Comment