Monday, 2 September 2019

Pandas Check last N Rows for values, new column based on results

I have a DataFrame, Df2. I'm trying to check each of the last 10 rows for the column Lead_Lag below - if there's any value besides null in any of those rows, then I want a new column Position to equal 'Y':

def run_HG_AUDUSD_15M_Aggregate():
    Df1 = pd.read_csv(max(glob.iglob(r"C:\Users\cost9\OneDrive\Documents\PYTHON\Daily Tasks\Pairs Trading\HG_AUDUSD\CSV\15M\Lead_Lag\*.csv"), key=os.path.getctime))    
    Df2 = Df1[['Date', 'Close_HG', 'Close_AUDUSD', 'Lead_Lag']]

    Df2['Position'] = ''

    for index,row in Df2.iterrows():
        if Df2.loc[Df2.index.shift(-10):index,"Lead_Lag"].isnull(): 
            continue
        else:
            Df2.loc[index, 'Position'] = "Y"

A sample of the data is as follows:

Date        Close_HG        Close_AUDUSD    Lead_Lag
7/19/2017 12:59 2.7     0.7956  
7/19/2017 13:59 2.7     0.7955  
7/19/2017 14:14 2.7     0.7954  
7/20/2017 3:14  2.7     0.791   
7/20/2017 5:44  2.7     0.791   
7/20/2017 7:44  2.71    0.7925  
7/20/2017 7:59  2.7     0.7924  
7/20/2017 8:44  2.7     0.7953  Short_Both
7/20/2017 10:44 2.71    0.7964  Short_Both
7/20/2017 11:14 2.71    0.7963  Short_Both
7/20/2017 11:29 2.71    0.7967  Short_Both
7/20/2017 13:14 2.71    0.796   Short_Both
7/20/2017 13:29 2.71    0.7956  Short_Both
7/20/2017 14:29 2.71    0.7957  Short_Both

So in this case I would want the last two values for the new column Position to be 'Y' as there are values in the Lead_Lag column in at least one of the last 10 rows. I want to apply this on a rolling basis - for instance row 13 'Position' value would look at rows 12-3, row 12 'Position' value would look at rows 11-2, etc.

Instead I get the error:

NotImplementedError: Not supported for type RangeIndex 

I've tried several variations of the shift method (defining before the loop etc.) and can't get it to work.

edit: Here's the solution:

N = 10
Df2['Position'] = ''
for index,row in Df2.iterrows():
if (Df2.loc[index-N:index,"Lead_Lag"] != "N").any():
Df2.loc[index, 'Position'] = "Y"
else:
Df2.loc[index, 'Position'] = "N"



from Pandas Check last N Rows for values, new column based on results

No comments:

Post a Comment