Say I have a pandas data frame where there are time intervals between start and end times, and then a value associated with each interval.
import random
import time
import numpy as np
def random_date(input_dt = None):
if input_dt is None:
start = 921032233
else:
start = dt.datetime.timestamp(pd.to_datetime(input_dt))
d = random.randint(start, int(time.time()))
return dt.datetime.fromtimestamp(d).strftime('%Y-%m-%d %H:%M:%S')
date_ranges = []
for _ in range(200):
date_range = []
for i in range(2):
if i == 0:
date_range.append(random_date())
else:
date_range.append(random_date(date_range[0]))
date_ranges.append(date_range)
date_ranges_df = pd.DataFrame(date_ranges, columns=['start_dt', 'end_dt'])
date_ranges_df['value'] = np.random.random((date_ranges_df.shape[0], 1))
There's 2 ways I can frame the problem and I would accept either answer.
-
Obtain the sum of every different overlapping interval. Meaning there should be a sum associated with varying (non-overlapping and sequentially complete) time intervals. i.e. if the overlapping time intervals are unchanged for a period of time, the sum would remain unchanged and have a single value - then when the overlapping intervals changes in any way (removal or addition of a time interval) a new sum would be calculated. This may involve some self-merge on the table.
-
The other (and maybe easier) way would be to define a standard time interval like 1 hour, and ask what is the sum of all overlapping intervals in this hour segment?
Resulting data frame should have a similar structure with start and end times followed by a value column representing the sum of all values in that interval.
EDIT: to obtain the bounty I would need the solutions for both #1 and #2 methods.
from Sum values associated with time intervals where intervals overlap in python
No comments:
Post a Comment