Is there a way to check the convergence when fitting a distribution in SciPy?
My goal is to fit a SciPy distribution (namely Johnson S_U distr.) to dozens of datasets as a part of an automated data-monitorign system. Mostly it works fine, but a few datasets are anomalous and clearly do not follow the Johnson S_U distribution. Fits on these datasets diverge silently, i.e. without any warning/error/whatever! On the contrary, if I switch to R and try to fit there I never ever get a convergence, which is right - regardless of the fit settings, the R algorithm denies to declare a convergence.
data: Two datasets are available in Dropbox:
data-converging-fit.csv
... a standard data where fit converges nicely:
data-diverging-fit.csv
... an anomalous data where fit diverges:
code to fit the distributoin:
import pandas as pd
from scipy import stats
distribution_name = 'johnsonsu'
dist = getattr(stats, distribution_name)
convdata = pd.read_csv('data-converging-fit.csv', index_col= 'timestamp')
divdata = pd.read_csv('data-diverging-fit.csv', index_col= 'timestamp')
On the good data, the fitted parameters have common order of magnitude:
a, b, loc, scale = dist.fit(convdata['target'])
a, b, loc, scale
[out]: (0.3154946859186918,
2.9938226613743932,
0.002176043693009398,
0.045430055488776266)
On the anomalous data, the fitted parameters are unreasonable:
a, b, loc, scale = dist.fit(divdata['target'])
a, b, loc, scale
[out]: (-3424954.6481554992,
7272004.43156841,
-71078.33596490842,
145478.1300979394)
Still I get no single line of warning that the fit failed to converge.
From researching similar questions on StackOverflow, I know the suggestion to bin my data and then use curve_fit
. Despite its practicality, that solution is not right in my opinion, since that is not the way we fit distributions: the binning is arbitrary (the nr. of bins) and it affects the final fit. A more realistic option might be scipy.optimize.minimize
and callbacks to learn the progrss of convergence; still I am not sure that it will eventually tell me whether the algorithm converged.
from How to check the convergence when fitting a distribution in SciPy
No comments:
Post a Comment