I have two sets of time series data that I want to correlate against my heart rate.
Currently, the two sets look a bit like this,
I've removed user_id as a column for obvious reasons :P
Scheduled,
I want to take my heart rate data
These are the functions I used for test data,
def get_stats(group):
diff = group.astype(int).abs().diff()
return {'min': group.min(), 'max': group.max(),\
'mean': group.mean(),'median': group.median(),\
'var': group.var(),'std': group.std()}
def timeFilter(df):
df_Meds = df.reset_index()
start_date = '03-01-2020'
end_date = '12-30-2021'
mask = (df_Meds['created'] > start_date) & (df_Meds['created'] <= end_date)
df = df_Meds.loc[mask]
return df
Passing in just one substance with one set of times,like this
scheduled_substance_user=scheduled_substance_df[scheduled_substance_df.user_id =="xxx"]
dexamphet_scheduled = scheduled_substance_user[scheduled_substance_user.substance =="Dexamphetamine"]
# dexamphet_scheduled.iloc[0]
dexamphet_scheduled.scheduledtimes.iloc[0]
returning this,
[8.0, 14.0, 18.0]
I get something like this,
test_df = SubstanceTimingHeartRate(Biometrics,dexamphet_scheduled.scheduledtimes.iloc[0])
test_df[test_df.medication >0]
but now I want to label is
- What the substance is,
- Use that substances duration
- If it was an actual, or scheduled timestamp.
Appreciate any help with this - python is super complex to me.
Very confused about how best to approach this with these and appreciate any help.
Update to code so far,
scheduled_substance_user=scheduled_substance_df[scheduled_substance_df.user_id =="xxxx"]
n=0
sentiment_scheduled=pd.DataFrame()
for x in scheduled_substance_user.substance.unique():
drug_scheduled = scheduled_substance_user[scheduled_substance_user.substance ==x]
# print("get here?")
print("\tdrug",x,drug_scheduled.substance.unique())
if n<1:
n=n+1
# print("get here?")
sentiment_scheduled=SubstanceTimingSentiment(Emotions,drug_scheduled.scheduledtimes.iloc[0],drug_scheduled.duration.iloc[0],x,drug_scheduled.index[0],drug_scheduled.enddate.iloc[0])
print("\t Unique substances",sentiment_scheduled.substance.unique())
else:
# print("n>=1?")
# drug_scheduled = scheduled_substance_user[scheduled_substance_user.substance ==x]
sentiment_scheduled.append(SubstanceTimingSentiment(Emotions,drug_scheduled.scheduledtimes.iloc[0],drug_scheduled.duration.iloc[0],x,drug_scheduled.index[0],drug_scheduled.enddate.iloc[0]), ignore_index=True)
print("\t Unique substances in dataframe",sentiment_scheduled.substance.unique())
and the function,
def SubstanceTimingSentiment(emo_d,substance_timing,duration,substance_name,startdate,enddate):
print("call this function?")
start_date=pd.to_datetime(startdate)
end_date=pd.to_datetime(enddate)
# convert times to datetime
med_times = pd.to_datetime(substance_timing)
# df.created = pd.to_datetime(df.created)
df = emo_d.reset_index()
# drop rows that are empty except for column 0 (i.e., except for df.created)
# df.dropna(subset=df.columns[1:], inplace=True)
# convert times to datetime i.e., ['09:00:00', '12:00:00', '15:00:00']
df.created = pd.to_datetime(df.created)
df = df[(df.created > start_date) & (df.created < end_date)]
print("\tstart and end times:",startdate,enddate)
taken = pd.to_datetime(substance_timing)
# generate time arrays
duration = int(duration) # hours
active = np.array([(taken + pd.Timedelta(f'{h}H')).time for h in range(duration)]).ravel()
after = (taken + pd.Timedelta(f'{duration}H')).time
# define boolean masks by label
conditions = {
1: df.created.dt.floor('H').dt.time.isin(active),
2: df.created.dt.floor('H').dt.time.isin(after),
}
# create medication column with np.select()
df['medication'] = np.select(conditions.values(), conditions.keys(), default=0)
df['substance'] = substance_name
df = df[['created',
'sentiment', 'magnitude','substance','medication']].set_index('created')
return df
Currently when it is looping through, it isn't appending the new dataframe from the function, and just keeping the first substance in place :/ like this (log)
drug Dexamphetamine ['Dexamphetamine']
call this function?
start and end times: 2021-10-29 2021-11-08
Unique substances ['Dexamphetamine']
drug Modafinil ['Modafinil']
call this function?
start and end times: 1996-12-02 2021-11-08
Unique substances in dataframe ['Dexamphetamine']
drug Nicotine ['Nicotine']
call this function?
start and end times: 2022-02-01 2022-02-27
Unique substances in dataframe ['Dexamphetamine']
drug L-Theanine ['L-Theanine']
call this function?
start and end times: 1990-12-26 2022-01-19
Unique substances in dataframe ['Dexamphetamine']
drug Ritalin ['Ritalin']
call this function?
start and end times: 1996-12-02 2022-01-31
Unique substances in dataframe ['Dexamphetamine']
drug Vysanth ['Vysanth']
call this function?
start and end times: 1996-12-02 2021-11-08
Unique substances in dataframe ['Dexamphetamine']
from using two separate sets of time series data to correlate against other datasets




No comments:
Post a Comment