I have a list of list of Data Frames, each corresponding to a different time period t from 0 to N. Each data frame has multiple types, i need to preform the finally for each type in the data frame.
An example data set would be as follows, i made each df in the list the same values for simplicity but the calculation would remain the same.
d = {'type': ['a', 'b', 'c'], 'x': [1, 2, 3], 'y':[3,4,5]}
df = pd.DataFrame(data=d)
l = []
window = 20
[l.append(df.copy(deep=True)) for i in range(window)]
I need to compute a vectored sum $$\sum$$
and product $$\prod$$
using the above list of dataframes for each type (a, b, c) in an efficient manner. e.g. for each calculation below I need to filter by a single type df[df['type'] == 'a']
for every df in the list.
If i use a df.groupby('type')
on every df in the list it could be very slow using a larger data set. The same goes if i use nested for loops for the sum and the product and filtering by type in each iteration of the for loop. How can I compute this sum product in an efficient manner ?
Update
One possible way as suggested in the comments is below, however this will start to be very inefficient if I use a larger dataset or window:
import pandas as pd
import math
d = {'type': ['a', 'b', 'c'], 'x': [1, 2, 3], 'y':[3, 4, 5]}
df = pd.DataFrame(data=d)
l = []
window = 20
for i in range(window):
df['window'] = i
l.append(df.copy(deep=True))
df = pd.concat(l)
sums = {}
types = df['type'].unique()
for s in types:
sums[s] = 0
tdf = df[df['type'] == s]
for i in range(window):
sums[s] += tdf[tdf['window'] == i]['x'].values[0] * math.prod(2 - tdf[tdf['window'] == i+1]['x'].values[0] for i in range(0, window-1)) * tdf[tdf['window'] == i]['y'].values[0]
from Vectorized sum and product with list of pandas data frames
No comments:
Post a Comment