Hemant Vishwakarma: Empirical CDF function in python with reasonable NaN behavior

Tuesday, 24 August 2021

Empirical CDF function in python with reasonable NaN behavior

I'm looking to compute the ECDF and am using this statsmodels function:

from statsmodels.distributions.empirical_distribution import ECDF

Looks good at first:

ECDF(np.array([0,1,2,3, 3, 3]))(np.array([0,1,2,3, 3,3]))
array([0.16666667, 0.33333333, 0.5       , 1.        , 1.        ,
       1.        ])

However, nan seems to be treated as infinity:


>>> x = np.array([0,1,2,3, np.nan, np.nan])
>>> ECDF(x)(x)
array([0.16666667, 0.33333333, 0.5       , 0.66666667, 1.        ,
       1.        ])

Same as:

np.array([0,1,2,3, np.inf, np.inf])
ECDF(x)(x)
array([0.16666667, 0.33333333, 0.5       , 0.66666667, 1.        ,
       1.        ])

Comparing with R:

> x <- c(0,1,2,3,NA,NA)
> x
[1]  0  1  2  3 NA NA
> ecdf(x)(x)
[1] 0.25 0.50 0.75 1.00   NA   NA

What's the standard python function for ecdf that is nan aware?

Hot-wiring like so does not seem to work:

def ecdf(x):
  return np.where(~np.isfinite(x),
                  np.full_like(x, np.nan),
                  ECDF(x[np.isfinite(x)])(x[np.isfinite(x)]))

ecdf(x)
    ECDF(x[np.isfinite(x)])(x[np.isfinite(x)]))
  File "<__array_function__ internals>", line 6, in where
ValueError: operands could not be broadcast together with shapes (7,) (7,) (4,)

from Empirical CDF function in python with reasonable NaN behavior

Hemant Vishwakarma

Tuesday, 24 August 2021

Empirical CDF function in python with reasonable NaN behavior

No comments:

Post a Comment