Objective and Motivation
eval and query are powerful, but underrated functions in the pandas API suite, and their use is far from being fully documented or understood. With the right amount of care, query and eval can greatly simplify code, improve performance, and become a powerful tool for creating dynamic workflows.
The aim of this canonical QnA is to give users a better understanding of these functions, discussing some of the lesser known features, how they are used, and how best to use them, with clear and easy to understand examples. The two main topics this post will address are
- Understanding
engine,parserandtargetarguments inpd.eval, and how they can be used to evaluate expressions - Understanding the difference between
pd.eval,df.evalanddf.query, and when each function is appropriate to use for dynamic execution.
This post is not a substitute for the documentation (links in the answer), so please do go through that as well!
The Question
I will frame a question in such a way that opens discussion for various features supported by eval.
Given two DataFrames
np.random.seed(0)
df1 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))
df1
A B C D
0 5 0 3 3
1 7 9 3 5
2 2 4 7 6
3 8 8 1 6
4 7 7 8 1
df2
A B C D
0 5 9 8 9
1 4 3 0 3
2 5 0 2 3
3 8 1 3 3
4 3 7 0 1
I would like to perform arithmetic on one or more columns using pd.eval. Specifically, I would like to port the following code:
x = 5
df2['D'] = df1['A'] + (df1['B'] * x)
...to code using eval. The reason for using eval is that I would like to automate many workflows, so creating them dynamically will be useful to me.
I am trying to better understand the engine and parser arguments to determine how best to solve my problem. I have gone through the documentation but the difference was not made clear to me.
- What arguments should be used to ensure my code is working at max performance?
- Is there a way to assign the result of the expression back to
df2? - Also, to make things more complicated, how do I pass
xas an argument inside the string expression?
from Dynamic Expression Evaluation in pandas using pd.eval()
No comments:
Post a Comment