Wednesday, 16 March 2022

using two separate sets of time series data to correlate against other datasets

I have two sets of time series data that I want to correlate against my heart rate.

Currently, the two sets look a bit like this,

I've removed user_id as a column for obvious reasons :P

Scheduled,

enter image description here Actual,

enter image description here

I want to take my heart rate data

enter image description here

These are the functions I used for test data,

def get_stats(group):
    diff = group.astype(int).abs().diff()

    return {'min': group.min(), 'max': group.max(),\
            'mean': group.mean(),'median': group.median(),\
           'var': group.var(),'std': group.std()}


def timeFilter(df):
    df_Meds = df.reset_index()

    start_date = '03-01-2020'
    end_date = '12-30-2021'
    mask = (df_Meds['created'] > start_date) & (df_Meds['created'] <= end_date)

    df = df_Meds.loc[mask]
    return df

Passing in just one substance with one set of times,like this

scheduled_substance_user=scheduled_substance_df[scheduled_substance_df.user_id =="xxx"]
dexamphet_scheduled = scheduled_substance_user[scheduled_substance_user.substance =="Dexamphetamine"]
# dexamphet_scheduled.iloc[0]

dexamphet_scheduled.scheduledtimes.iloc[0]

returning this,

[8.0, 14.0, 18.0]

I get something like this,

test_df = SubstanceTimingHeartRate(Biometrics,dexamphet_scheduled.scheduledtimes.iloc[0])

test_df[test_df.medication >0]

but now I want to label is

  1. What the substance is,
  2. Use that substances duration
  3. If it was an actual, or scheduled timestamp.

enter image description here

Appreciate any help with this - python is super complex to me.

Very confused about how best to approach this with these and appreciate any help.

Update to code so far,

scheduled_substance_user=scheduled_substance_df[scheduled_substance_df.user_id =="xxxx"]

n=0
sentiment_scheduled=pd.DataFrame()
for x in scheduled_substance_user.substance.unique():
    drug_scheduled = scheduled_substance_user[scheduled_substance_user.substance ==x]
    # print("get here?")
    print("\tdrug",x,drug_scheduled.substance.unique())
    if n<1:

        n=n+1

        # print("get here?")

        sentiment_scheduled=SubstanceTimingSentiment(Emotions,drug_scheduled.scheduledtimes.iloc[0],drug_scheduled.duration.iloc[0],x,drug_scheduled.index[0],drug_scheduled.enddate.iloc[0])
        print("\t Unique substances",sentiment_scheduled.substance.unique())
       

    else:
    # print("n>=1?")
        # drug_scheduled = scheduled_substance_user[scheduled_substance_user.substance ==x]


        sentiment_scheduled.append(SubstanceTimingSentiment(Emotions,drug_scheduled.scheduledtimes.iloc[0],drug_scheduled.duration.iloc[0],x,drug_scheduled.index[0],drug_scheduled.enddate.iloc[0]), ignore_index=True)
        print("\t Unique substances in dataframe",sentiment_scheduled.substance.unique())

and the function,

def SubstanceTimingSentiment(emo_d,substance_timing,duration,substance_name,startdate,enddate):
    print("call this function?")
    start_date=pd.to_datetime(startdate)
    end_date=pd.to_datetime(enddate)
    # convert times to datetime
    med_times = pd.to_datetime(substance_timing)
    # df.created = pd.to_datetime(df.created)
    df = emo_d.reset_index()

    # drop rows that are empty except for column 0 (i.e., except for df.created)
    # df.dropna(subset=df.columns[1:], inplace=True)

    # convert times to datetime i.e., ['09:00:00', '12:00:00', '15:00:00']
    df.created = pd.to_datetime(df.created)
    df = df[(df.created > start_date) & (df.created < end_date)]
    print("\tstart and end times:",startdate,enddate)

    taken = pd.to_datetime(substance_timing)

    # generate time arrays
    duration = int(duration) # hours
    active = np.array([(taken + pd.Timedelta(f'{h}H')).time for h in range(duration)]).ravel()
    after = (taken + pd.Timedelta(f'{duration}H')).time

    # define boolean masks by label
    conditions = {
        1: df.created.dt.floor('H').dt.time.isin(active),
        2: df.created.dt.floor('H').dt.time.isin(after),
    }

    # create medication column with np.select()
    df['medication'] = np.select(conditions.values(), conditions.keys(), default=0)
    df['substance'] = substance_name
    df = df[['created', 
        'sentiment', 'magnitude','substance','medication']].set_index('created')
    return df

Currently when it is looping through, it isn't appending the new dataframe from the function, and just keeping the first substance in place :/ like this (log)

drug Dexamphetamine ['Dexamphetamine']
call this function?
    start and end times: 2021-10-29 2021-11-08
     Unique substances ['Dexamphetamine']
    drug Modafinil ['Modafinil']
call this function?
    start and end times: 1996-12-02 2021-11-08
     Unique substances in dataframe ['Dexamphetamine']
    drug Nicotine ['Nicotine']
call this function?
    start and end times: 2022-02-01 2022-02-27
     Unique substances in dataframe ['Dexamphetamine']
    drug L-Theanine ['L-Theanine']
call this function?
    start and end times: 1990-12-26 2022-01-19
     Unique substances in dataframe ['Dexamphetamine']
    drug Ritalin ['Ritalin']
call this function?
    start and end times: 1996-12-02 2022-01-31
     Unique substances in dataframe ['Dexamphetamine']
    drug Vysanth ['Vysanth']
call this function?
    start and end times: 1996-12-02 2021-11-08
     Unique substances in dataframe ['Dexamphetamine']


from using two separate sets of time series data to correlate against other datasets

No comments:

Post a Comment