Monday, 22 May 2023

HuggingFace Evaluate a Fine-tuned Zero-Shot Model

I am finetuning the HuggingFace facebook/bart-large-mnli model to suit my need, I use the following parameters:

training_args = TrainingArguments(
    output_dir=model_directory,      # output directory
    num_train_epochs=30,              # total number of training epochs
    per_device_train_batch_size=1,  # batch size per device during training - 16 - Don't go over 1, it's out of memory
    per_device_eval_batch_size=2,   # batch size for evaluation - 64 - Don't go over 2, it's out of memory
    warmup_steps=500,                 # number of warmup steps for learning rate scheduler - 500
    weight_decay=0.01,               # strength of weight decay
)

model = BartForSequenceClassification.from_pretrained("facebook/bart-large-mnli")

trainer = Trainer(
    model=model,                          # the instantiated 🤗 Transformers model to be trained
    args=training_args,                   # training arguments, defined above
    compute_metrics=compute_metrics,      # a function to compute the metrics
    train_dataset=train_dataset,          # training dataset
    eval_dataset=test_dataset             # evaluation dataset
)

# Train the trainer
trainer.train()

The compute_metrics I use is:

import numpy as np
from datasets import Dataset, load_metric
from transformers import EvalPrediction

def compute_metrics(p: EvalPrediction):
  metric_acc = load_metric("accuracy")
  preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
  preds = np.argmax(preds, axis=1)
  result = {}
  result["accuracy"] = metric_acc.compute(predictions=preds, references=p.label_ids)["accuracy"]
  return result

But no matter how much train or test data I use, or how many epochs, when I use trainer.evaluate() I get an accuracy of 0.5.

My questions are:

  1. How do I improve it?
  2. How do I implement other metrics for the evaluation? for example F1 score.

I tried changing (adding) the metrics to this:

def compute_metrics(p: EvalPrediction):
  load_accuracy = load_metric("accuracy")
  load_f1 = load_metric("f1")
  preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
  preds = np.argmax(preds, axis=1)
  result = {}
  result["accuracy"] = load_accuracy.compute(predictions=preds, references=p.label_ids)["accuracy"]
  result["f1"] = load_f1.compute(predictions=preds, references=p.label_ids)["f1"]
  return result

But then I got this error while running trainer.evaluate():

ValueError: pos_label=1 is not a valid label. It should be one of [0, 2]


You can refer to my previous question for more details about my finetuning here



from HuggingFace Evaluate a Fine-tuned Zero-Shot Model

No comments:

Post a Comment