I was learning the MPL regressor at Google Colaboratory and ran the source code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data = np.array(table)
scaler.fit(data)
y_index = data.shape[1]-1
sd_x = (scaler.var_[:y_index])**0.5
sd_y = (scaler.var_[y_index])**0.5
mean_x = scaler.mean_[:y_index]
mean_y = scaler.mean_[y_index]
x = (data[:, :y_index]).astype(np.float32)
y = (data[:, y_index]).astype(np.float32)
train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.25)
print('Separate training and testing sets!')
It gave the error ValueError: could not convert string to float: 'Photo Editor & Candy Camera & Grid & ScrapBook'.
So I checked the question RandomForestClassfier.fit(): ValueError: could not convert string to float. I also tried sklearn-LinearRegression: could not convert string to float: '--'.
I changed from fit(data) to fit_transform(data), but the same error still insisted. Then I changed from StandardScaler to LabelEncoder, and from scaler = StandardScaler() to scaler = LabelEncoder(). But the different error appeared: ValueError: bad input shape (10841, 13) on the line scaler.fit_transform(data).
You can check the CSV from Kaggle's CSV here. The CSV contains both strings and numbers without quotation marks (except the prices which contain double quotation marks).
from Python's "StandardScaler" and "LabelEncoder", and "fit" and "fit_transform" do not work with a CSV which contains both float and string
No comments:
Post a Comment