Tuesday, 4 July 2023

Sum values associated with time intervals where intervals overlap in python

Say I have a pandas data frame where there are time intervals between start and end times, and then a value associated with each interval.

import random
import time
import numpy as np

def random_date(input_dt = None):
    if input_dt is None:
        start = 921032233
    else:
        start = dt.datetime.timestamp(pd.to_datetime(input_dt))
    d = random.randint(start, int(time.time()))
    return dt.datetime.fromtimestamp(d).strftime('%Y-%m-%d %H:%M:%S')

date_ranges = []
for _ in range(200):
    date_range = []
    for i in range(2):
        if i == 0:
            date_range.append(random_date())
        else:
            date_range.append(random_date(date_range[0]))
    date_ranges.append(date_range)

date_ranges_df = pd.DataFrame(date_ranges, columns=['start_dt', 'end_dt'])
date_ranges_df['value'] = np.random.random((date_ranges_df.shape[0], 1))

There's 2 ways I can frame the problem and I would accept either answer.

  1. Obtain the sum of every different overlapping interval. Meaning there should be a sum associated with varying (non-overlapping and sequentially complete) time intervals. i.e. if the overlapping time intervals are unchanged for a period of time, the sum would remain unchanged and have a single value - then when the overlapping intervals changes in any way (removal or addition of a time interval) a new sum would be calculated. This may involve some self-merge on the table.

  2. The other (and maybe easier) way would be to define a standard time interval like 1 hour, and ask what is the sum of all overlapping intervals in this hour segment?

Resulting data frame should have a similar structure with start and end times followed by a value column representing the sum of all values in that interval.

EDIT: to obtain the bounty I would need the solutions for both #1 and #2 methods.



from Sum values associated with time intervals where intervals overlap in python

No comments:

Post a Comment