Wednesday, 15 May 2019

Function containing pandas query can't find local variable only if imported

EDIT Before BOUNTY:

I would like to have a function where I can write subset(pd.dataframe,query="col %in% list_with_substrings")

Where it returns the same dataframe where only rows contains any of the substrings in list_with_substrings for the col.

As if i had written: dataframe.query("col in @list_with_substring")

And it should be possible to import that function into another script, and not have to redefine the function everytime it is used. If redefining is the only option, then it should be done inside the function itself. So that the subset call is a single line.

Original Post:

I have two scripts:

"dataprocessing.py"

import pandas as pd
def subset(df,query):
    query = query.replace("%in%", "in @")
    query = query.replace("%!in%", "not in @")     
    return pd.DataFrame(df.query(query))

and "test_dataprocessing.py"

from dataprocessing import *

df = pd.DataFrame({'countries':['US','UK','GE','Ch',"DK","SW"]})
countries_to_subset = ['UK','CH']

subset(df,query="countries %in% countries_to_subset")

This produces the error:

pandas.core.computation.ops.UndefinedVariableError: local variable 'countries_to_subset' is not defined

but if I then define the function inside the same script

def subset(df,query):
    query = query.replace("%in%", "in @")
    query = query.replace("%!in%", "not in @")
    return pd.DataFrame(df.query(query))

subset(df,query="countries %in% countries_to_subset")

Out: 
  countries  GDP
1        UK    2
3     China    4

So I can't import a function that uses query and pass it a local variable? Is there a way to import the subset function "as if" it was defined in the same script?

tested on Python 3.6 & 3.7 and pandas 0.23.0 & 0.24.2



from Function containing pandas query can't find local variable only if imported

No comments:

Post a Comment