Saturday 3 October 2020

Turning a Multiclass Classifier into a Hierarchical Multiclass Classifier

I am using an e-commerce dataset to predict product categories. I use the product description and supplier code as features, and predict the product category.

from sklearn import preprocessing
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import ensemble

df['joined_features'] = df['description'].astype(str) + ' ' + df['supplier'].astype(str) 

# split the dataset into training and validation datasets 
train_x, valid_x, train_y, valid_y = model_selection.train_test_split(df['joined_features'], df['category'])

# encode target variable 
encoder = preprocessing.LabelEncoder()
train_y = encoder.fit_transform(train_y)
valid_y = encoder.fit_transform(valid_y)

# count vectorizer object 
count_vect = CountVectorizer(analyzer='word')
count_vect.fit(df['joined_features'])

# transform training and validation data
xtrain_count =  count_vect.transform(train_x)
xvalid_count =  count_vect.transform(valid_x)

classifier = ensemble.RandomForestClassifier()
classifier.fit(xtrain_count, train_y)
predictions = classifier.predict(feature_vector_valid)

I get ~90% accuracy with this prediction. I now want to predict more categories. These categories are hierarchical. The category I predicted was the main one. I want to predict a couple more.

As an example, I predicted clothing. Now I want to predict: Clothing -> Shoes

I tried joining both categories: df['category1'] + df['category2'] and predicting them as one, but I get around 2% accuracy, which is really low.

What is the proper way to make a classifier in a hierarchical fashion?



from Turning a Multiclass Classifier into a Hierarchical Multiclass Classifier

No comments:

Post a Comment