Hemant Vishwakarma: How to predict data outside of the training data set

Sunday, 13 December 2020

How to predict data outside of the training data set

using this module to predict country names from address:

import re
import numpy as np
import pandas as pd
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
def normalize_text(s):
    s = s.lower()
    s = re.sub('\s\W',' ',s)
    s = re.sub('\W\s',' ',s)
    s = re.sub('\s+',' ',s)
    return(s)
df['TEXT'] = [normalize_text(s) for s in df['Full_Address']]

vectorizer = CountVectorizer()
x = vectorizer.fit_transform(df['TEXT'])

encoder = LabelEncoder()
y = encoder.fit_transform(df['CountryName'])

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

nb = MultinomialNB()
nb.fit(x_train, y_train)
y_predicted = nb.predict(x_test)
accuracy_score(y_test, y_predicted)

I want to use the module I built to predict a single string address, how can I do this? I tried:

nb.predict('1100 112th Ave NE #400, Bellevue, WA 98004, United States')

ValueError: Expected 2D array, got scalar array instead:
array=1100 112th Ave NE #400, Bellevue, WA 98004, United States.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

UPDATE:

as suggested in an answer:

nb.predict([['1100 112th Ave NE #400, Bellevue, WA 98004, United States']])

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 82043 is different from 1)

from How to predict data outside of the training data set

Hemant Vishwakarma

Sunday, 13 December 2020

How to predict data outside of the training data set

No comments:

Post a Comment