Thursday, 1 December 2022

Pandas: resample hourly values to monthly values with offset

I want to aggregate a pandas.Series with an hourly DatetimeIndex to monthly values - while considering the offset to midnight.

Example

Consider the following (uniform) timeseries that spans about 1.5 months.

import pandas as pd
hours = pd.Series(1, pd.date_range('2020-02-23 06:00', freq = 'H', periods=1008))
hours
# 2020-02-23 06:00:00    1
# 2020-02-23 07:00:00    1
#                       ..
# 2020-04-05 04:00:00    1
# 2020-04-05 05:00:00    1
# Freq: H, Length: 1000, dtype: int64

I would like to sum these to months while considering, that days start at 06:00 in this use-case. The result should be:

2020-02-01 06:00:00    168
2020-03-01 06:00:00    744
2020-04-01 06:00:00     96
freq: MS, dtype: int64

How do I do that??


What I've tried and what works

  • I can aggregate to days while considering the offset, using the offset parameter:

    days = hours.resample('D', offset=pd.Timedelta('06:00:00')).sum()
    days
    # 2020-02-23 06:00:00    24
    # 2020-02-24 06:00:00    24
    # ..
    # 2020-04-03 06:00:00    24
    # 2020-04-04 06:00:00    24  
    # Freq: D, dtype: int64
    
  • Using the same method to aggregate to months does not work. The timestamps do not have a time component, and the values are incorrect:

    months = hours.resample('MS', offset=pd.Timedelta('06:00:00')).sum()
    months
    # 2020-02-01    162 # wrong
    # 2020-03-01    744
    # 2020-04-01    102 # wrong
    # Freq: MS, dtype: int64
    
  • I could do the aggregation to months as a second step after aggregating to days. In that case, the values are correct, but the time component is still missing from the timestamps:

    days = hours.resample('D', offset=pd.Timedelta('06:00:00')).sum()
    months = days.resample('MS', offset=pd.Timedelta('06:00:00')).sum()
    months
    # 2020-02-01    168
    # 2020-03-01    744
    # 2020-04-01     96
    # Freq: MS, dtype: int64
    
  • My current workaround is adding the timedelta and resetting the frequency manually.

    months.index += pd.Timedelta('06:00:00')
    months.index.freq = 'MS' 
    months
    # 2020-02-01 06:00:00    168
    # 2020-03-01 06:00:00    744
    # 2020-04-01 06:00:00     96
    # freq: MS, dtype: int64
    


from Pandas: resample hourly values to monthly values with offset

No comments:

Post a Comment