Thursday, 2 January 2020

Applying group-specific function that returns a single series

I'm trying to figure out an efficient split/apply/combine scheme for the following scenario. Consider the pandas dataframe demoAll defined below:

import datetime
import pandas as pd


demoA = pd.DataFrame({'date':[datetime.date(2010,1,1), datetime.date(2010,1,2), datetime.date(2010,1,3)],
                     'ticker':['A', 'A', 'A'],
                     'x1':[10,20,30],
                     'close':[120, 133, 129]}).set_index('date', drop=True)
demoB = pd.DataFrame({'date':[datetime.date(2010,1,1), datetime.date(2010,1,2), datetime.date(2010,1,3)],
                     'ticker':['B', 'B', 'B'],
                     'x1':[18,11,45],
                     'close':[50, 49, 51]}).set_index('date', drop=True)
demoAll = pd.concat([demoA, demoB])
print(demoAll)

The result is:

           ticker  x1  close
date                        
2010-01-01      A  10    120
2010-01-02      A  20    133
2010-01-03      A  30    129
2010-01-01      B  18     50
2010-01-02      B  11     49
2010-01-03      B  45     51

I also have a dictionary mapping of tickers to model objects

ticker2model = {'A':model_A, 'B':model_B,...}

where each model has a self.predict(df) method that takes-in an entire dataframe and returns a series of the same length.

I now would like to create a new column, demoAll['predictions'], that corresponds to these predictions. What is the cleanest/most-efficient way of doing this? A few things to note:

1.) demoAll was the concatenation of ticker-specific dataframes that were each indexed just by date. Thus the indices of demoAll are not unique. (However, the combination of date/ticker IS unique.)

2.) My thinking has been to do something like the example below, but running into issues with indexing, data-type coercions, and slow run times. The real dataset is quite large (both rows and columns).

demoAll['predictions'] = demoAll.groupby('ticker').apply(
lambda x: ticker2model[x.name].predict(x)
)


from Applying group-specific function that returns a single series

No comments:

Post a Comment