Friday 30 October 2020

Python numpy groupby multiple columns

Is there a way to make a group by aggregation by multiple columns in numpy? Im trying to do it with this module: https://github.com/ml31415/numpy-groupies Goal is to get a faster groupby than pandas. for example:

group_idx = np.array([
np.array([4, 3, 3, 4, 4, 1, 1, 1, 7, 8, 7, 4, 3, 3, 1, 1]),
np.array([4, 3, 2, 4, 7, 1, 4, 1, 7, 8, 7, 2, 3, 1, 14 1]),
np.array([1, 2, 3, 4, 5, 1, 1, 2, 3, 4, 5, 4, 2, 3, 1, 1])
]
a = np.array([1, 2, 1, 2, 1, 2, 1, 2, 3, 4, 5, 4, 2, 3, 1, 1])

result = aggregate(group_idx, a, func='sum')

It should be like pandas df.groupby(['column1','column2','column3']).sum().reset_index()



from Python numpy groupby multiple columns

No comments:

Post a Comment