Tuesday, 12 September 2023

NLTK sentence_bleu() returns 0 while evaluating Chinese sentences

I'm trying to evaluate Chinese sentence BLEU scores with NLTK's sentence_bleu() function. The code is as follows:

import nltk
import jieba

from transformers import AutoTokenizer, BertTokenizer, BartForConditionalGeneration

src = '樓上漏水耍花招不處理可以怎麼做'
ref = '上層漏水耍手段不去處理可以怎麼做'

checkpoint = 'fnlp/bart-base-chinese'
tokenizer = BertTokenizer.from_pretrained(checkpoint)
model = BartForConditionalGeneration.from_pretrained(checkpoint)

hypothesis_translations = []

for sentence in [src]:
    inputs = tokenizer(sentence, return_tensors="pt", truncation=True, max_length=100, return_token_type_ids=False)
    outputs = model.generate(**inputs)
    translated_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True)
    hypothesis_translations.append(translated_sentence)

# for Reference tokenization
inputs_ref = tokenizer(ref, return_tensors="pt", truncation=True, max_length=100, return_token_type_ids=False)
outputs_ref = model.generate(**inputs_ref)
tokenized_ref = tokenizer.decode(outputs_ref[0], skip_special_tokens=True)

nltk_bleu = nltk.translate.bleu_score.sentence_bleu(tokenized_ref, hypothesis_translations)
print(nltk_bleu)

The output of printing nltk_bleu is 0.

But when I use the corpus_score() of SacreBLEU library, it returns normal and expected results:

import evaluate
from sacrebleu.metrics import BLEU

bleu = BLEU()
bleu_score = bleu.corpus_score(references=tokenized_ref, hypotheses=hypothesis_translations)
print(bleu_score)

which returns:

BLEU = 4.79 73.3/3.6/1.9/1.0 (BP = 1.000 ratio = 15.000 hyp_len = 15 ref_len = 1)

How can I make the NLTK sentence_score return correct results?



from NLTK sentence_bleu() returns 0 while evaluating Chinese sentences

No comments:

Post a Comment