Wednesday, 13 February 2019

Efficient python pandas equivalent/implementation of R sweep with multiple arguments

Other questions attempting to provide the python equivalent to R's sweepfunction (like here) do not really address the case of multiple arguments where it is most useful.

Say I wish to apply a 2 argument function to each row of a Dataframe with the matching element from a column of another DataFrame:

df = data.frame("A" = 1:3,"B" = 11:13)
df2= data.frame("X" = 10:12,"Y" = 10000:10002)
sweep(df,1, FUN="*",df2$X)

In python I got the equivalent using apply on what is basically a loop through the row counts.

df = pd.DataFrame( { "A" : range(1,4),"B" : range(11,14) } )
df2 = pd.DataFrame( { "X" : range(10,13),"Y" : range(10000,10003) } )
pd.Series(range(df.shape[0])).apply(lambda row_count: np.multiply(df.iloc[row_count,:],df2.iloc[row_count,df2.columns.get_loc('X')]))

I highly doubt this is efficient in pandas, what is a better way of doing this?

Both bits of code should result in a Dataframe/matrix of 6 numbers when applying *:

   A   B
1 10 110
2 22 132
3 36 156

I should state clearly that the aim is to insert one's own function into this sweep like behavior say:

df = data.frame("A" = 1:3,"B" = 11:13)
df2= data.frame("X" = 10:12,"Y" = 10000:10002)
myFunc = function(a,b) { floor((a + b)^min(a/2,b/3))  }
sweep(df,1, FUN=myFunc,df2$X)

resulting in:

 A B
[1,] 3 4
[2,] 3 4
[3,] 3 5

What is a good way of doing that in python pandas?



from Efficient python pandas equivalent/implementation of R sweep with multiple arguments

No comments:

Post a Comment