Hemant Vishwakarma: Naive Gaussian predict probability only returns 0 or 1

Monday, 21 June 2021

Naive Gaussian predict probability only returns 0 or 1

I trained the GaussianNB model from scikit sklearn. When I call the method classifier.predict_proba it only returns 1 or 0 on new data. It is expected to return a percentage of confidence that the prediction is correct or not. I doubt it can have 100% confidence on new data it has never seen before. I have tested it on multiple different inputs. I use CountVectorizer and TfidfTransformer for the text encoding.

The encoding:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer

count_vect = CountVectorizer()
tfidf_transformer = TfidfTransformer()

X_train_counts = count_vect.fit_transform(X_train_word)
X_train = tfidf_transformer.fit_transform(X_train_counts).toarray()
print(X_train)

X_test_counts = count_vect.transform(X_test_word)
X_test = tfidf_transformer.transform(X_test_counts).toarray()
print(X_test)

The model: (I am getting an accuracy of 91%)

from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

# Predict Class
y_pred = classifier.predict(X_test)

# Accuracy 
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)

And finally, when I use the predict_proba method:

y_pred = classifier.predict_proba(X_test)
print(y_pred)

I am getting an output like:

[[0. 1.]
 [1. 0.]
 [0. 1.]
 ...
 [1. 0.]
 [1. 0.]
 [1. 0.]]

It doesn't make much sense to have 100% accuracy on new data. Other than on y_test I have tested it on other inputs and it still returns the same. Any help would be appreciated!

Edit for the comments: The response of .predict_log_proba() is even more strange:

[[ 0.00000000e+00 -6.95947375e+09]
 [-4.83948755e+09  0.00000000e+00]
 [ 0.00000000e+00 -1.26497690e+10]
 ...
 [ 0.00000000e+00 -6.97191054e+09]
 [ 0.00000000e+00 -2.25589894e+09]
 [ 0.00000000e+00 -2.93089863e+09]]

from Naive Gaussian predict probability only returns 0 or 1

Hemant Vishwakarma

Monday, 21 June 2021

Naive Gaussian predict probability only returns 0 or 1

No comments:

Post a Comment