I am using composition method to create a class with a contained pandas dataframe as shown below. I am creating a derived
property by doing some operation on the base columns.
import numpy as np
import pandas as pd
class myclass:
def __init__(self, *args, **kwargs):
self.df = pd.DataFrame(*args, **kwargs)
@property
def derived(self):
return self.df.sum(axis=1)
myobj = myclass(np.random.randint(100, size=(100,6)))
d = mc.derived
The calculation of derived
is an expensive step and hence I would like to cache this function. I want to use functools.lru_cache
for the same. However, it requires that the original object be hashed. I tried creating a __hash__
function for the object as detailed in this answer https://stackoverflow.com/a/47800021/3679377.
Now I run in to a new problem where the hashing function is an expensive step!. Is there any way to get around this problem? Or have I reached a dead end?
Is there any better way to check if a dataframe has been modified and if not, keep returning the same hash?
from Hashing a pandas dataframe for calculated column caching
No comments:
Post a Comment