the xlearn predict function gives a different mse than what you get by looking at the predictions and calculating it yourself. Here is code to do this; you can run it by cloning the xlearn repository and copying the below code in demo/regression/house_price in the repository
# Copyright (c) 2018 by contributors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import xlearn as xl
import pandas as pd
from sklearn.metrics import mean_squared_error
# Training task
# param:
# 0. regression task
# 1. learning rate: 0.2
# 2. regular lambda: 0.002
# 3. evaluation metric: mae
fm_model = xl.create_linear() # Use factorization machine
fm_model.setTrain("./house_price_train.txt") # Training data
fm_model.setValidate("./house_price_test.txt") # Validation data
fm_model.disableNorm()
# fm_model.setSigmoid()
# fm_model.disableEarlyStop()
param = {'task':'reg', 'lr':0.0002,
'lambda':0.00001, 'metric':'rmse', 'epoch':100}
# Start to train
# The trained model will be stored in model.out
print("here")
fm_model.fit(param, './model.out')
fm_model.setTest("./house_price_test.txt") # Test data
# Prediction task
# Start to predict
# The output result will be stored in output.txt
outs = fm_model.predict("./model.out", "./output.txt")
true = pd.read_csv("./house_price_test.txt", sep='\t', header=None)[0]
# print(true)
preds = pd.read_csv("./output.txt", header=None)[0]
# Calculate using sklearn
print(mean_squared_error(true, preds))
# Self calculate
sq = 0.0
for t, p in zip(true, preds):
sq += (t - p) ** 2
print(sq/len(true))
If you save it as min_eg.py, run it (after installing xlearn) as python min_eg.py simply.
Here is the output you get:
Interestingly the MSE you get is always exactly double of the reported mse from the predict function.
Any help is greatly appreciated; and I wonder if other folks have encountered the same issue.
from xlearn predictions error give a different mse than output by the function

No comments:
Post a Comment