Sunday, 22 August 2021

Find a subset of columns based on another dataframe with NaN values?

I'm attempting to get the mean values on one data frame between certain time points that are marked as events in a second data frame.

This is a follow up to this question, where now I have missing/NaN values: Find a subset of columns based on another dataframe?

import pandas as pd 
import numpy as np

 #example 
example_g = [["4/20/21 4:20", 302, 0, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN],
   ["2/17/21 9:20",135, 1, 1.4, 1.8, 2, 8, 10],
   ["2/17/21 9:20", 111, 4, 5, 5.1, 5.2, 5.3, 5.4]]
example_g_table = pd.DataFrame(example_g,columns=['Date_Time','CID', 0.0, 0.1, 0.2, 0.3, 0.4, 0.5])

#Example Timestamps
example_s = [["4/20/21 4:20",302,0, 2, np.NaN],
   ["2/17/21 9:20",135,0, 1, 4 ],
   ["2/17/21 9:20",111,3, 4, 5 ]]
example_s_table = pd.DataFrame(example_s,columns=['Date_Time','CID', "event_1", "event_2", "event_3"])

df = pd.merge(left=example_g_table,right=example_s_table,on=['Date_Time','CID'],how='left')

def func(df):
    event_2 = df['event_2']
    event_3 = df['event_3']
    start = event_2 + 2 # this assumes that the column called 0 will be the third (and starting at 0, it'll be the called 2), column 1 will be the third column, etc
    end = event_3 + 2 # same as above
    total = sum(df.iloc[start:end+1]) # this line is the key. It takes the sum of the values of columns in the range of start to finish
    avg = total/(end-start+1) #(end-start+1) gets the count of things in our range
    return avg

df['avg'] = df.apply(func,axis=1)

I get the following error: 
cannot do positional indexing on Index with these indexers [nan] of type float

I have attempted making sure that columns are floats and have tried removing the int() command within the definitions of the events.

How can I preform the same calculations as before where possible but while skipping any values that are NaN?



from Find a subset of columns based on another dataframe with NaN values?

No comments:

Post a Comment