Monday, 3 May 2021

Measure of Feature Importance in PCA

I am doing Principle Component Analysis (PCA) and I'd like to find out which features that contribute the most to the result.

My intuition is to sum up all the absolute values of the individual contribution of the features to the individual components.

import numpy as np
from sklearn.decomposition import PCA

X = np.array([[-1, -1, 4, 1], [-2, -1, 4, 2], [-3, -2, 4, 3], [1, 1, 4, 4], [2, 1, 4, 5], [3, 2, 4, 6]])
pca = PCA(n_components=0.95, whiten=True, svd_solver='full').fit(X)
pca.components_
array([[ 0.71417303,  0.46711713,  0.        ,  0.52130459],
       [-0.46602418, -0.23839061, -0.        ,  0.85205128]])
np.sum(np.abs(pca.components_), axis=0)
array([1.18019721, 0.70550774, 0.        , 1.37335586])

This yields, in my eyes, a measure of importance of each of the original features. Note that the 3rd feature has zero importance, because I intentionally created a column that is just a constant value.

Is there a better "measure of importance" for PCA?



from Measure of Feature Importance in PCA

No comments:

Post a Comment