Wednesday 27 October 2021

Tensorflow decision forest custom metric vs. number of trees

I have created a classification model using tensorflow decision forests. I'm struggling to evaluate how the performance changes vs. number of trees for a non-default metric (in this case PR-AUC).

Below is some code with my attempts.

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_decision_forests as tfdf

train = load_diabetes()
X = pd.DataFrame(train['data'])
X['target'] = (pd.Series(train['target']) > 100).astype(int)
X_train, X_test = train_test_split(X)
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(X_train, label="target")   
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(X_test, label="target")   
pr_auc = tf.keras.metrics.AUC( curve='PR',)
tfdf_clf = tfdf.keras.GradientBoostedTreesModel()
tfdf_clf.compile(metrics=[pr_auc])
tfdf_clf.fit(train_ds, validation_data=test_ds,)

Now I get very useful training logs using

tfdf_clf.make_inspector().training_logs()
#[TrainLog(num_trees=1, evaluation=Evaluation(num_examples=None, accuracy=0.9005518555641174, loss=0.6005926132202148, rmse=None, ndcg=None, aucs=None)),
#TrainLog(num_trees=2, evaluation=Evaluation(num_examples=None, accuracy=0.9005518555641174, loss=0.5672071576118469, rmse=None, ndcg=None, aucs=None)),

But it doesn't contain any info on PR-AUC vs. iterations

If I evaluate the model, it only persists PR-AUC at the end of training, although it seens to log some intermediate info.

tfdf_clf.evaluate(test_ds)

1180/1180 [==============================] - 10s 8ms/step - loss: 0.0000e+00 - auc: 0.6832

How can I find how test-data PR-AUC changes vs. number of trees? I need to specifically use tensforflow decision forest library.



from Tensorflow decision forest custom metric vs. number of trees

No comments:

Post a Comment