Monday, 6 December 2021

Sklearn Transformers: How to apply encoder to multiple columns and reuse it in production?

I am using label encoder during training and want to use same encoder in production by saving it and loading it later. Whatever solutions I have found online only allow Label Encoder to apply on the single column at a time like below:

for col in col_list:
    df[col]= df[[col]].apply(LabelEncoder().fit_transform)

In this case how do I save it and use it later? Because I tried fitting on entire datafreame but I am getting following error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
C:\Users\DA~1\AppData\Local\Temp/ipykernel_3884/730613134.py in <module>
----> 1 l_enc.fit_transform(df_join[le_col].astype(str))

~\anaconda3\envs\ReturnRate\lib\site-packages\sklearn\preprocessing\_label.py in fit_transform(self, y)
    113             Encoded labels.
    114         """
--> 115         y = column_or_1d(y, warn=True)
    116         self.classes_, y = _unique(y, return_inverse=True)
    117         return y

~\anaconda3\envs\ReturnRate\lib\site-packages\sklearn\utils\validation.py in column_or_1d(y, warn)
   1022         return np.ravel(y)
   1023 
-> 1024     raise ValueError(
   1025         "y should be a 1d array, got an array of shape {} instead.".format(shape)
   1026     )

ValueError: y should be a 1d array, got an array of shape (3949037, 14) instead.

I want to fit label encoder to dataframe with 10 columns (all categorical), save it and load it later in production.



from Sklearn Transformers: How to apply encoder to multiple columns and reuse it in production?

No comments:

Post a Comment