I have a very simple function like this one:
import numpy as np
from numba import jit
import pandas as pd
@jit
def f_(n, x, y, z):
for i in range(n):
z[i] = x[i] * y[i]
f_(df.shape[0], df["x"].values, df["y"].values, df[""].values)
To which I pass
df = pd.DataFrame({"x": [1, 2, 3], "y": [3, 4, 5], "z": np.NaN})
I expected that function will modify data z
column in place like this:
>>> f_(df.shape[0], df["x"].values, df["y"].values, df["z"].values)
>>> df
x y z
0 1 3 3.0
1 2 4 8.0
2 3 5 15.0
This works fine most of the time, but somehow fails to modify data in others.
I double checked things and:
- I haven't determined any problems with data points which could cause this problem.
- I see that data is modified as expected when I print the result.
- If I return
z
array from the function it is modified as expected.
Unfortunately I couldn't reduce the problem to a minimal reproducible case. For example removing unrelated columns seems to "fix" the problem making reduction impossible.
Do I use jit
in a way that is not intended to be used? Are there any border cases I should be aware of? Or is it likely to be a bug?
Edit:
I found the source of the problem. It occurs when data contains duplicated column names:
>>> df_ = pd.read_json('{"schema": {"fields":[{"name":"index","type":"integer"},{"name":"v","type":"integer"},{"name":"y","type":"integer"},
... {"name":"v","type":"integer"},{"name":"x","type":"integer"},{"name":"z","type":"number"}],"primaryKey":["index"],"pandas_version":"0.20.
... 0"}, "data": [{"index":0,"v":0,"y":3,"v":0,"x":1,"z":null}]}', orient="table")
>>> f_(df_.shape[0], df_["x"].values, df_["y"].values, df_["z"].values)
>>> df_
v y v x z
0 0 3 0 1 NaN
If duplicate is removed the function works like expected:
>>> df_.drop("v", axis="columns", inplace=True)
>>> f_(df_.shape[0], df_["x"].values, df_["y"].values, df_["z"].values)
>>> df_
y x z
0 3 1 3.0
from Inconsistent behavior of jitted function
No comments:
Post a Comment