Hemant Vishwakarma: Inconsistent behavior of jitted function

Saturday, 21 July 2018

Inconsistent behavior of jitted function

I have a very simple function like this one:

import numpy as np
from numba import jit
import pandas as pd

@jit
def f_(n, x, y, z):
    for i in range(n):
        z[i] = x[i] * y[i] 

f_(df.shape[0], df["x"].values, df["y"].values, df[""].values)

To which I pass

df = pd.DataFrame({"x": [1, 2, 3], "y": [3, 4, 5], "z": np.NaN})

I expected that function will modify data z column in place like this:

>>> f_(df.shape[0], df["x"].values, df["y"].values, df["z"].values)
>>> df

   x  y     z
0  1  3   3.0
1  2  4   8.0
2  3  5  15.0

This works fine most of the time, but somehow fails to modify data in others.

I double checked things and:

I haven't determined any problems with data points which could cause this problem.
I see that data is modified as expected when I print the result.
If I return z array from the function it is modified as expected.

Unfortunately I couldn't reduce the problem to a minimal reproducible case. For example removing unrelated columns seems to "fix" the problem making reduction impossible.

Do I use jit in a way that is not intended to be used? Are there any border cases I should be aware of? Or is it likely to be a bug?

Edit:

I found the source of the problem. It occurs when data contains duplicated column names:

>>> df_ = pd.read_json('{"schema": {"fields":[{"name":"index","type":"integer"},{"name":"v","type":"integer"},{"name":"y","type":"integer"},
... {"name":"v","type":"integer"},{"name":"x","type":"integer"},{"name":"z","type":"number"}],"primaryKey":["index"],"pandas_version":"0.20.
... 0"}, "data": [{"index":0,"v":0,"y":3,"v":0,"x":1,"z":null}]}', orient="table")
>>> f_(df_.shape[0], df_["x"].values, df_["y"].values, df_["z"].values)
>>> df_
   v  y  v  x   z
0  0  3  0  1 NaN

If duplicate is removed the function works like expected:

>>> df_.drop("v", axis="columns", inplace=True)
>>> f_(df_.shape[0], df_["x"].values, df_["y"].values, df_["z"].values)
>>> df_
   y  x    z
0  3  1  3.0

from Inconsistent behavior of jitted function

Hemant Vishwakarma

Saturday, 21 July 2018

Inconsistent behavior of jitted function

No comments:

Post a Comment