I'm attempting to get the mean values on one data frame between certain time points that are marked as events in a second data frame.
This is a follow up to this question, where now I have missing/NaN values: Find a subset of columns based on another dataframe?
import pandas as pd
import numpy as np
#example
example_g = [["4/20/21 4:20", 302, 0, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN],
["2/17/21 9:20",135, 1, 1.4, 1.8, 2, 8, 10],
["2/17/21 9:20", 111, 4, 5, 5.1, 5.2, 5.3, 5.4]]
example_g_table = pd.DataFrame(example_g,columns=['Date_Time','CID', 0.0, 0.1, 0.2, 0.3, 0.4, 0.5])
#Example Timestamps
example_s = [["4/20/21 4:20",302,0, 2, np.NaN],
["2/17/21 9:20",135,0, 1, 4 ],
["2/17/21 9:20",111,3, 4, 5 ]]
example_s_table = pd.DataFrame(example_s,columns=['Date_Time','CID', "event_1", "event_2", "event_3"])
df = pd.merge(left=example_g_table,right=example_s_table,on=['Date_Time','CID'],how='left')
def func(df):
event_2 = df['event_2']
event_3 = df['event_3']
start = event_2 + 2 # this assumes that the column called 0 will be the third (and starting at 0, it'll be the called 2), column 1 will be the third column, etc
end = event_3 + 2 # same as above
total = sum(df.iloc[start:end+1]) # this line is the key. It takes the sum of the values of columns in the range of start to finish
avg = total/(end-start+1) #(end-start+1) gets the count of things in our range
return avg
df['avg'] = df.apply(func,axis=1)
I get the following error:
cannot do positional indexing on Index with these indexers [nan] of type float
I have attempted making sure that columns are floats and have tried removing the int() command within the definitions of the events.
How can I preform the same calculations as before where possible but while skipping any values that are NaN?
from Find a subset of columns based on another dataframe with NaN values?
No comments:
Post a Comment