Hemant Vishwakarma: Splitting coef into arrays applicable for multi class

Wednesday, 29 August 2018

Splitting coef into arrays applicable for multi class

I use this function to plot the best and worst features (coef) for each label.

 def plot_coefficients(classifier, feature_names, top_features=20):
     coef = classifier.coef_.ravel()
     for i in np.split(coef,6): 
        top_positive_coefficients = np.argsort(i)[-top_features:]
        top_negative_coefficients = np.argsort(i)[:top_features]
        top_coefficients = np.hstack([top_negative_coefficients, top_positive_coefficients])
     # create plot
     plt.figure(figsize=(15, 5))
     colors = ["red" if c < 0 else "blue" for c in i[top_coefficients]]
     plt.bar(np.arange(2 * top_features), i[top_coefficients], color=colors)
     feature_names = np.array(feature_names)
     plt.xticks(np.arange(1, 1 + 2 * top_features), feature_names[top_coefficients], rotation=60, ha="right")
     plt.show()

Applying it to sklearn.LinearSVC:

if (name == "LinearSVC"):   
    print(clf.coef_)
    print(clf.intercept_)
    plot_coefficients(clf, cv.get_feature_names())

The CountVectorizer used has a dimension of (15258, 26728). It's a multi-class decision problem with 6 labels. Using .ravel returns a flat array with a length of 6*26728=160368. Meaning that all indicies that are higher than 26728 are out of bound for axis 1. Here are the top and bottom indices for one label:

i[ 0. 0. 0.07465654 ... -0.02112607  0. -0.13656274]
Top [39336 35593 29445 29715 36418 28631 28332 40843 34760 35887 48455 27753
 33291 54136 36067 33961 34644 38816 36407 35781]

i[ 0. 0. 0.07465654 ... -0.02112607  0. -0.13656274]
Bot [39397 40215 34521 39392 34586 32206 36526 42766 48373 31783 35404 30296
 33165 29964 50325 53620 34805 32596 34807 40895]

The first entry in the "top" list has the index 39336. This is equal to the entry 39337-26728=12608 in the vocabulary. What would I need to change in the code to make this applicable?

from Splitting coef into arrays applicable for multi class

Hemant Vishwakarma

Wednesday, 29 August 2018

Splitting coef into arrays applicable for multi class

No comments:

Post a Comment