Monday, 7 October 2019

Outlier prediction with categorical data in Pythons Scikit-Learn lib

Im trying to make prediction with my own output. Im using Python Scikit-learn lib and Isolation Forest as algorithm. I do not know what am I doing wrong, but when I want to transform my input data I always get an error. I get an error in this line:

    input_par = encoder.transform(val)#ERROR

this is the error: Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

And I have tried this, but I always get an error:

    input_par = encoder.transform([val])#ERROR

this is the error: alueError: Specifying the columns using strings is only supported for pandas DataFrames

What am I doing wrong, how can I fix this error? Also, should I use OneHotEncoder, LabelEncoder or CountVectorizer?

This is my code:

import pandas as pd

from sklearn.ensemble import IsolationForest
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder

textual_data = ['i love you', 'I love your dress', 'i like that', 'thats good', 'amazing', 'wrong', 'hi, how are you, are you doing good']
num_data = [4, 1, 3, 2, 65, 3,3]

df = pd.DataFrame({'my text': textual_data,
                   'num data': num_data})
x = df

# Transform the features
encoder = ColumnTransformer(transformers=[('onehot', OneHotEncoder(), ['my text'])], remainder='passthrough')
#encoder = ColumnTransformer(transformers=[('lab', LabelEncoder(), ['my text'])])

x = encoder.fit_transform(x)

isolation_forest = IsolationForest(contamination = 'auto', behaviour = 'new')
model = isolation_forest.fit(x)

list_of_val = [['good work',2], ['you are wrong',54], ['this was amazing',1]]

for val in list_of_val:

    input_par = encoder.transform(val)#ERROR

    outlier = model.predict(input_par)
    #print(outlier)

    if outlier[0] == -1:
        print('Values', val, 'are outliers')

    else:
        print('Values', val, 'are not outliers')


from Outlier prediction with categorical data in Pythons Scikit-Learn lib

No comments:

Post a Comment