Wednesday 7 July 2021

How to use f1-score for CrossValidator evaluator in a binary problem(BinaryClassificationEvaluator) in pyspark 2.3

My use case is a common use case: binary classification with unbalanced labels so we decided to use f1-score for hyper-param selection via cross-validation, we are using pyspark 2.3 and pyspark.ml, we create a CrossValidator object but for the evaluator, the issue is the following:

  • BinaryClassificationEvaluator does not have f1 score as evaluation metric.
  • MulticlassClassificationEvaluator has f1 score, but is returning wrong results, my guess is it calculates f1 for every class(in this case just 2) and returns some kind of average across them, since the negative class(y=0) is predominant it generates high f1 but the model is really bad(f1 score for positive class is 0)
  • MulticlassClassificationEvaluator added in recent versions the parameter evaluator.metricLabel which I think allows to specify which label to use(in my case I would set it to 1), but it is not available on spark 2.3

But the problem is: I use a corporate/enterprise spark cluster with no plans to upgrade current version(2.3) so the question is: how can I use f1 score in a CrossValidator evaluator for binary case considering we are restricted to spark 2.3



from How to use f1-score for CrossValidator evaluator in a binary problem(BinaryClassificationEvaluator) in pyspark 2.3

No comments:

Post a Comment