As an example, have a look at these numpy arrays:
>>> a
array([[1, 2, 3],
[4, 5, 6]])
>>> b
array([[ True, False, True],
[False, False, True],
[ True, True, False]])
Say I want the sum of each row of a
including the elements specified in each row of b
. Here's two instructions that do just that:
>>> np.sum(a[:,None] * b[None], 2)
array([[ 4, 3, 3],
[10, 6, 9]])
>>> np.sum(np.where(b[None], a[:,None], 0), 2)
array([[ 4, 3, 3],
[10, 6, 9]])
I usually use the first option, but recently found out np.sum
has a where
parameter, and would expect this to work:
>>> np.sum(a[:,None], 2, where=b[None])
array([[10],
[25]])
But the result is different. I can see each row actually corresponds to the sum of the rows in the correct result.
I also found that when dimensions already match without broadcasting, the results using both methods are the same:
>>> a
array([[1, 2, 3],
[4, 5, 6]])
>>> b
array([[ True, False, True],
[False, False, True]])
>>> np.sum(a * b, 1)
array([4, 6])
>>> np.sum(a, 1, where=b)
array([4, 6])
What is the explanation for this behaviour? Is there a way to prevent it, or should I stick to my previous method?
from Unexpected numpy sum behaviour with where parameter
No comments:
Post a Comment