I want to grid search over a set of hyper parameters to tune a clustering model. GridSearchCV
offers a bunch of scoring functions for unsupervised learning but I want to use a function that's not in there, e.g. silhouette score.
The documentation on how to implement my custom function is unclear on how we should define our scoring function. The example there shows simply importing a custom scorer and using make_scorer
to create a custom scoring function. However, make_scorer
seems to require the true values (which doesn't exist in unsupervised learning), so it's not clear how to use it.
Here's what I have so far:
from sklearn.datasets import make_blobs
from sklearn.model_selection import GridSearchCV
from sklearn.cluster import DBSCAN
from sklearn.metrics import silhouette_score, make_scorer
def my_custom_function(model, X):
preds = model.predict(X)
return silhouette_score(X, preds)
Z, _ = make_blobs()
model = DBSCAN()
pgrid = {'eps': [0.1*i for i in range(1,6)]}
gs = GridSearchCV(model, pgrid, scoring=my_custom_function)
gs.fit(Z)
best_score = gs.score(Z)
But it throws two errors:
TypeError: my_custom_function() takes 2 positional arguments but 3 were given
and
AttributeError: 'DBSCAN' object has no attribute 'predict'
How do I correctly define my custom scoring function?
from How to use a custom scoring function in GridSearchCV for unsupervised learning
No comments:
Post a Comment