Wednesday, 26 July 2023

XGBoost's requires int or float when I actually have int and float

I have the following data:

x_train is np.ndarray, y_train is np.ndarray and model is xgboost.sklearn.XGBClassifier. The types are:

print(type(x_train))
print(x_train.dtype)

>> <class 'numpy.ndarray'>
>> float64

print(type(y_train))
print(y_train.dtype)

>> <class 'numpy.ndarray'>
>> float64

print(type(model))

>> xgboost.sklearn.XGBClassifier

I am using Databricks Runtime 12.2 LTS ML which corresponds to xgboost==1.7.2.

Getting the following error:

model.fit(x_train, y_train)

>> XGBoostError: [09:28:22] ../src/data/data.cc:254: All feature_types must be one of {int, float, i, q, c}.

y_train is actually a vector or 1s and 0s, I have also tried with casting it to np.int32 or np.int64. Then, I tried casting it to builtins.int and builtins.float, as such:

x_train = np.array(x_train, dtype=float)
y_train = np.array(y_train, dtype=int)
print(x_train.dtype)
print(y_train.dtype)

>>float64
>>int64

Same error as before.

I have checked this post but this does not help me as my types are different. I would prefer not to have to convert from numpy dtypes since these have worked in the past and my config files are set in such as way ..

Other relevant packages: sklearn==0.0.post7 and scikit-learn==1.0.2. You can reproduce the error as follows:

import numpy as np
import xgboost as xgb

params = {'base_score': 0.5,
 'booster': 'gbtree',
 'callbacks': 'null',
 'colsample_bylevel': 1,
 'colsample_bynode': 1,
 'colsample_bytree': 1,
 'early_stopping_rounds': 'null',
 'enable_categorical': False,
 'eval_metric': 'aucpr',
 'feature_types': 'null',
 'gamma': 7,
 'gpu_id': -1,
 'grow_policy': 'lossguide',
 'importance_type': 'null',
 'interaction_constraints': '',
 'learning_rate': 0.05610004032698376,
 'max_bin': 256,
 'max_cat_threshold': 64,
 'max_cat_to_onehot': 4,
 'max_delta_step': 0,
 'max_depth': 2,
 'max_leaves': 0,
 'min_child_weight': 1,
 'monotone_constraints': (),
 'n_estimators': 1275,
 'n_jobs': 4,
 'num_parallel_tree': 1,
 'objective': 'binary:logistic',
 'predictor': 'auto',
 'random_state': 0,
 'reg_alpha': 0,
 'reg_lambda': 60,
 'sampling_method': 'uniform',
 'scale_pos_weight': 11.507905606798213,
 'subsample': 1,
 'tree_method': 'hist',
 'use_label_encoder': False,
 'validate_parameters': 1,
 'verbosity': 0}

model = xgb.XGBClassifier(**params)
x = np.random.normal(0,1,(100,10)).astype(np.float64)
y = np.random.uniform(0,1,100).astype(np.int64)
model.fit(x,y)
 



from XGBoost's requires int or float when I actually have int and float

No comments:

Post a Comment