Hemant Vishwakarma: Transforming Kernel Density Estimator for non-negative observations

Friday, 4 August 2023

Transforming Kernel Density Estimator for non-negative observations

I am modelling the distribution of repair costs with the Kernel Density Estimator of the scikit learn package in Python. I have created the density function fitted to my observations, but when taking a random sample from this distribution negative values occur. Since the observations regard costs, which are always positive, sample values should be non-negative.

I have read that with transformation of the data this result can be reached. These sources use log transformation to truncate the distribution at 0 (Log-transform kernel density estimation of income distribution, Kernel Density Estimation for Random Variables with Bounded Support — The Transformation Trick ). The problem is that I don't know how to use this log transformation of my observations in combination with the scikit learn Kernal Density function.

The code for the KDE without tranformation is as follows:

import numpy as np
from sklearn.neighbors import KernelDensity
import math as math

'Dataframe with costs'
x = costs

maxVal = x.max()
minVal = x.min()
upperBound = math.ceil(maxVal/1000)*1000

x_grid = np.linspace(0, upperBound, 1000)

'Create pdf with Kernel Density'
kde = KernelDensity(kernel='gaussian', bandwidth=612).fit(x_grid[:, np.newaxis])
log_pdf = kde.score_samples(x_grid[:, np.newaxis])
pdf=np.exp(log_pdf)

My code including transformation:

'Log tranformation and creation of pdf'

x_pseudo = x.apply(np.log)

kde_psuedo = KernelDensity(kernel='gaussian', bandwidth=612).fit(x_pseudo[:, np.newaxis])
log_pdf_pseudo = kde_psuedo.score_samples(x_pseudo[:, np.newaxis])
pdf_pseudo=np.exp(log_pdf_pseudo)

x_grid_log = np.linspace(minVal, maxVal, 1000)

density = np.zeros(len(x_grid_log))

for i in range(len(x_grid_log)):
    xx=x_grid_log[i]
    density[i]=pdf_pseudo[xx.apply(np.log)/xx]

output = list(x=x_grid_log, y=density)

This code is based on the example in source 2, that is made in R. I know the code is wrong, but I don't know how to fix this. Any help would be greatly appreciated!

from Transforming Kernel Density Estimator for non-negative observations

Hemant Vishwakarma

Friday, 4 August 2023

Transforming Kernel Density Estimator for non-negative observations

No comments:

Post a Comment