I'm looking to compute the ECDF and am using this statsmodels function:
from statsmodels.distributions.empirical_distribution import ECDF
Looks good at first:
ECDF(np.array([0,1,2,3, 3, 3]))(np.array([0,1,2,3, 3,3]))
array([0.16666667, 0.33333333, 0.5 , 1. , 1. ,
1. ])
However, nan seems to be treated as infinity:
>>> x = np.array([0,1,2,3, np.nan, np.nan])
>>> ECDF(x)(x)
array([0.16666667, 0.33333333, 0.5 , 0.66666667, 1. ,
1. ])
Same as:
np.array([0,1,2,3, np.inf, np.inf])
ECDF(x)(x)
array([0.16666667, 0.33333333, 0.5 , 0.66666667, 1. ,
1. ])
Comparing with R:
> x <- c(0,1,2,3,NA,NA)
> x
[1] 0 1 2 3 NA NA
> ecdf(x)(x)
[1] 0.25 0.50 0.75 1.00 NA NA
What's the standard python function for ecdf that is nan aware?
Hot-wiring like so does not seem to work:
def ecdf(x):
return np.where(~np.isfinite(x),
np.full_like(x, np.nan),
ECDF(x[np.isfinite(x)])(x[np.isfinite(x)]))
ecdf(x)
ECDF(x[np.isfinite(x)])(x[np.isfinite(x)]))
File "<__array_function__ internals>", line 6, in where
ValueError: operands could not be broadcast together with shapes (7,) (7,) (4,)
from Empirical CDF function in python with reasonable NaN behavior
No comments:
Post a Comment