Saturday, 19 December 2020

Something wrong when implementing SVM One-vs-all in python

I was trying to verify that I had correctly understood how SVM - OVA (One-versus-All) works, by comparing the function OneVsRestClassifier with my own implementation.

In the following code, I implemented num_classes classifiers in the training phase, and then tested all of them on the testset and selected the one returning the highest probability value.

import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score,classification_report
from sklearn.preprocessing import scale

# Read dataset 
df = pd.read_csv('In/winequality-white.csv',  delimiter=';')
X = df.loc[:, df.columns != 'quality']
Y = df.loc[:, df.columns == 'quality']
my_classes = np.unique(Y)
num_classes = len(my_classes)

# Train-test split
np.random.seed(42)
msk = np.random.rand(len(df)) <= 0.8
train = df[msk]
test = df[~msk]

# From dataset to features and labels
X_train = train.loc[:, train.columns != 'quality']
Y_train = train.loc[:, train.columns == 'quality']
X_test = test.loc[:, test.columns != 'quality']
Y_test = test.loc[:, test.columns == 'quality']

# Models
clf =  [None] * num_classes
for k in np.arange(0,num_classes):
    my_model = SVC(gamma='auto', C=1000, kernel='rbf', class_weight='balanced', probability=True)
    clf[k] = my_model.fit(X_train, Y_train==my_classes[k])

# Prediction
prob_table = np.zeros((len(Y_test), num_classes))
for k in np.arange(0,num_classes):
    p = clf[k].predict_proba(X_test)
    prob_table[:,k] = p[:,list(clf[k].classes_).index(True)]
Y_pred = prob_table.argmax(axis=1)

print("Test accuracy = ", accuracy_score( Y_test, Y_pred) * 100,"\n\n") 

Test accuracy is equal to 0.21, while when using the function OneVsRestClassifier, it returns 0.59. For completeness, I also report the other code (the pre-processing steps are the same as before):

....
clf = OneVsRestClassifier(SVC(gamma='auto', C=1000, kernel='rbf', class_weight='balanced'))
clf.fit(X_train, Y_train)
Y_pred = clf.predict(X_test)
print("Test accuracy = ", accuracy_score( Y_test, Y_pred) * 100,"\n\n")

Is there something wrong in my own implementation of SVM - OVA?



from Something wrong when implementing SVM One-vs-all in python

No comments:

Post a Comment