Here is a sample of the input pandas dataframe:
**LastUpdate** **Whatever** ...
2017-12-30 xxx ...
2017-12-30 yyy ...
2017-12-30 zzz ...
2018-01-01 yyy ...
2018-01-03 zzz ...
Here is the expected DF (output):
**LastUpdate** **Whatever** ...
2017-12-30 xxx ...
2017-12-30 yyy ...
2017-12-30 zzz ...
2017-12-31 xxx ...
2017-12-31 yyy ...
2017-12-31 zzz ...
2018-01-01 yyy ...
2018-01-02 yyy ...
2018-01-03 zzz ...
As you can see, the missing days in the data will simply duplicate previous day's rows so that I'm simply filling the missing days with (all) previous day data. The thing is that the number of rows per day might differ, so that's not really helping.
Important note: there may be more than only a day missing between two days (it could go from 2018-01-01 to 2018-01-05 so I would need to add all the missing days between these two days with the same data (with the exact same number of rows/content) as for the 2018-01-01, being the last day with data available.
I've made some research and came up with the resample, ffill and reset_index methods but it looks like it won't fit my specific case as it requires a unique date index, which is not the case here as one day may have several rows associated.
What I've tried so far:
df['Last Update'] = pd.to_datetime(df['Last Update'])
df.set_index("Last Update", inplace=True)
dfResult = df.resample('D').ffill().reset_index()
which yields cannot reindex a non-unique index with a method or limit (and that totally makes sense) but I really can't figure out a way to achieve what I'm trying to do. Let me know if anything is unclear or if you need any more additional information, any help would be appreciated
from Duplicating previous day rows for all missing dates dataframe
No comments:
Post a Comment