Tuesday, 16 March 2021

Unexpected numpy sum behaviour with where parameter

As an example, have a look at these numpy arrays:

>>> a
array([[1, 2, 3], 
       [4, 5, 6]])
>>> b
array([[ True, False,  True],
       [False, False,  True],
       [ True,  True, False]])

Say I want the sum of each row of a including the elements specified in each row of b. Here's two instructions that do just that:

>>> np.sum(a[:,None] * b[None], 2)
array([[ 4,  3,  3],
       [10,  6,  9]])
>>> np.sum(np.where(b[None], a[:,None], 0), 2)
array([[ 4,  3,  3],
       [10,  6,  9]])

I usually use the first option, but recently found out np.sum has a where parameter, and would expect this to work:

>>> np.sum(a[:,None], 2, where=b[None])
array([[10],
       [25]])

But the result is different. I can see each row actually corresponds to the sum of the rows in the correct result.

I also found that when dimensions already match without broadcasting, the results using both methods are the same:

>>> a
array([[1, 2, 3], 
       [4, 5, 6]])
>>> b
array([[ True, False,  True],
       [False, False,  True]])
>>> np.sum(a * b, 1)
array([4, 6])
>>> np.sum(a, 1, where=b)
array([4, 6])

What is the explanation for this behaviour? Is there a way to prevent it, or should I stick to my previous method?



from Unexpected numpy sum behaviour with where parameter

No comments:

Post a Comment