Tuesday 24 October 2023

How to use a custom scoring function in GridSearchCV for unsupervised learning

I want to grid search over a set of hyper parameters to tune a clustering model. GridSearchCV offers a bunch of scoring functions for unsupervised learning but I want to use a function that's not in there, e.g. silhouette score.

The documentation on how to implement my custom function is unclear on how we should define our scoring function. The example there shows simply importing a custom scorer and using make_scorer to create a custom scoring function. However, make_scorer seems to require the true values (which doesn't exist in unsupervised learning), so it's not clear how to use it.

Here's what I have so far:

from sklearn.datasets import make_blobs
from sklearn.model_selection import GridSearchCV
from sklearn.cluster import DBSCAN
from sklearn.metrics import silhouette_score, make_scorer

def my_custom_function(model, X):
    preds = model.predict(X)
    return silhouette_score(X, preds)

Z, _ = make_blobs()

model = DBSCAN()
pgrid = {'eps': [0.1*i for i in range(1,6)]}
gs = GridSearchCV(model, pgrid, scoring=my_custom_function)
gs.fit(Z)
best_score = gs.score(Z)

But it throws two errors:

TypeError: my_custom_function() takes 2 positional arguments but 3 were given

and

AttributeError: 'DBSCAN' object has no attribute 'predict'

How do I correctly define my custom scoring function?



from How to use a custom scoring function in GridSearchCV for unsupervised learning

No comments:

Post a Comment