EDIT Before BOUNTY:
I would like to have a function where I can write subset(pd.dataframe,query="col %in% list_with_substrings")
Where it returns the same dataframe where only rows contains any of the substrings in list_with_substrings
for the col
.
As if i had written: dataframe.query("col in @list_with_substring")
And it should be possible to import that function into another script, and not have to redefine the function everytime it is used. If redefining is the only option, then it should be done inside the function itself. So that the subset call is a single line.
Original Post:
I have two scripts:
"dataprocessing.py"
import pandas as pd
def subset(df,query):
query = query.replace("%in%", "in @")
query = query.replace("%!in%", "not in @")
return pd.DataFrame(df.query(query))
and "test_dataprocessing.py"
from dataprocessing import *
df = pd.DataFrame({'countries':['US','UK','GE','Ch',"DK","SW"]})
countries_to_subset = ['UK','CH']
subset(df,query="countries %in% countries_to_subset")
This produces the error:
pandas.core.computation.ops.UndefinedVariableError: local variable 'countries_to_subset' is not defined
but if I then define the function inside the same script
def subset(df,query):
query = query.replace("%in%", "in @")
query = query.replace("%!in%", "not in @")
return pd.DataFrame(df.query(query))
subset(df,query="countries %in% countries_to_subset")
Out:
countries GDP
1 UK 2
3 China 4
So I can't import a function that uses query and pass it a local variable? Is there a way to import the subset function "as if" it was defined in the same script?
tested on Python 3.6 & 3.7 and pandas 0.23.0 & 0.24.2
from Function containing pandas query can't find local variable only if imported
No comments:
Post a Comment