Monday, 8 February 2021

xlearn predictions error give a different mse than output by the function

the xlearn predict function gives a different mse than what you get by looking at the predictions and calculating it yourself. Here is code to do this; you can run it by cloning the xlearn repository and copying the below code in demo/regression/house_price in the repository

# Copyright (c) 2018 by contributors. All Rights Reserved.                                                                                                   
#                                                                                                                                                            
# Licensed under the Apache License, Version 2.0 (the "License");                                                                                            
# you may not use this file except in compliance with the License.                                                                                           
# You may obtain a copy of the License at                                                                                                                    
#                                                                                                                                                            
#     http://www.apache.org/licenses/LICENSE-2.0                                                                                                             
#                                                                                                                                                            
# Unless required by applicable law or agreed to in writing, software                                                                                        
# distributed under the License is distributed on an "AS IS" BASIS,                                                                                          
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.                                                                                   
# See the License for the specific language governing permissions and                                                                                        
# limitations under the License.                                                                                                                             
                                                                                                                                                             
import xlearn as xl                                                                                                                                          
import pandas as pd                                                                                                                                          
from sklearn.metrics import mean_squared_error                                                                                                               
                                                                                                                                                             
# Training task                                                                                                                                              
                                                                                                                                                             
# param:                                                                                                                                                     
#  0. regression task                                                                                                                                        
#  1. learning rate: 0.2                                                                                                                                     
#  2. regular lambda: 0.002                                                                                                                                  
#  3. evaluation metric: mae                                                                                                                                 
fm_model = xl.create_linear()  # Use factorization machine                                                                                                   
fm_model.setTrain("./house_price_train.txt")    # Training data                                                                                              
fm_model.setValidate("./house_price_test.txt")  # Validation data                                                                                            
fm_model.disableNorm()                                                                                                                                       
# fm_model.setSigmoid()                                                                                                                                      
# fm_model.disableEarlyStop()                                                                                                                                
param = {'task':'reg', 'lr':0.0002,                                                                                                                          
        'lambda':0.00001, 'metric':'rmse', 'epoch':100}                                                                                                      
                                                                                                                                                             
# Start to train                                                                                                                                             
# The trained model will be stored in model.out                                                                                                              
print("here")                                                                                                                                                
fm_model.fit(param, './model.out')                                                                                                                           
fm_model.setTest("./house_price_test.txt")  # Test data                                                                                                      
                                                                                                                                                             
# Prediction task                                                                                                                                            
                                                                                                                                                             
# Start to predict                                                                                                                                           
# The output result will be stored in output.txt                                                                                                             
outs = fm_model.predict("./model.out", "./output.txt")                                                                                                       
true = pd.read_csv("./house_price_test.txt", sep='\t', header=None)[0]                                                                                       
# print(true)                                                                                                                                                
preds = pd.read_csv("./output.txt", header=None)[0]                                                                                                          
                                                                                                                                                             
# Calculate using sklearn                                                                                                                                    
print(mean_squared_error(true, preds))                                                                                                                       
# Self calculate                                                                                                                                             
sq = 0.0                                                                                                                                                     
for t, p in zip(true, preds):                                                                                                                                
    sq += (t - p) ** 2                                                                                                                                       
print(sq/len(true))                                                                                                                                          

If you save it as min_eg.py, run it (after installing xlearn) as python min_eg.py simply.

Here is the output you get:

output snapshot

Interestingly the MSE you get is always exactly double of the reported mse from the predict function.

Any help is greatly appreciated; and I wonder if other folks have encountered the same issue.



from xlearn predictions error give a different mse than output by the function

No comments:

Post a Comment