Monday 16 July 2018

pandas: Composition for chained methods like .resample(), .rolling() etc

I would like to construct an extension of pandas.DataFrame — let's call it SPDF — which could do stuff above and beyond what a simple DataFrame can:

import pandas as pd
import numpy as np


def to_spdf(func):
    """Transform generic output of `func` to SPDF.

    Returns
    -------
    wrapper : callable
    """
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        return SPDF(res)

    return wrapper


class SPDF:
    """Special-purpose dataframe.

    Parameters
    ----------
    df : pandas.DataFrame

    """

    def __init__(self, df):
        self.df = df

    def __repr__(self):
        return repr(self.df)

    def __getattr__(self, item):
        res = getattr(self.df, item)

        if callable(res):
            res = to_spdf(res)

        return res


if __name__ == "__main__":

    # construct a generic SPDF
    df = pd.DataFrame(np.eye(4))
    an_spdf = SPDF(df)

    # call .diff() to obtain another SPDF
    print(an_spdf.diff())

Right now, methods of DataFrame that return another DataFrame, such as .diff() in the MWE above, return me another SPDF, which is great. However, I would also like to trick chained methods such as .resample('M').last() or .rolling(2).mean() into producing an SPDF in the very end. I have failed so far because .rolling() and the like are of type callable, and my wrapper to_spdf tries to construct an SPDF from their output without 'waiting' for .mean() or any other last part of the expression. Any ideas how to tackle this problem?

Thanks.



from pandas: Composition for chained methods like .resample(), .rolling() etc

No comments:

Post a Comment