I'm trying to evaluate Chinese sentence BLEU scores with NLTK's sentence_bleu()
function. The code is as follows:
import nltk
import jieba
from transformers import AutoTokenizer, BertTokenizer, BartForConditionalGeneration
src = '樓上漏水耍花招不處理可以怎麼做'
ref = '上層漏水耍手段不去處理可以怎麼做'
checkpoint = 'fnlp/bart-base-chinese'
tokenizer = BertTokenizer.from_pretrained(checkpoint)
model = BartForConditionalGeneration.from_pretrained(checkpoint)
hypothesis_translations = []
for sentence in [src]:
inputs = tokenizer(sentence, return_tensors="pt", truncation=True, max_length=100, return_token_type_ids=False)
outputs = model.generate(**inputs)
translated_sentence = tokenizer.decode(outputs[0], skip_special_tokens=True)
hypothesis_translations.append(translated_sentence)
# for Reference tokenization
inputs_ref = tokenizer(ref, return_tensors="pt", truncation=True, max_length=100, return_token_type_ids=False)
outputs_ref = model.generate(**inputs_ref)
tokenized_ref = tokenizer.decode(outputs_ref[0], skip_special_tokens=True)
nltk_bleu = nltk.translate.bleu_score.sentence_bleu(tokenized_ref, hypothesis_translations)
print(nltk_bleu)
The output of printing nltk_bleu
is 0
.
But when I use the corpus_score()
of SacreBLEU
library, it returns normal and expected results:
import evaluate
from sacrebleu.metrics import BLEU
bleu = BLEU()
bleu_score = bleu.corpus_score(references=tokenized_ref, hypotheses=hypothesis_translations)
print(bleu_score)
which returns:
BLEU = 4.79 73.3/3.6/1.9/1.0 (BP = 1.000 ratio = 15.000 hyp_len = 15 ref_len = 1)
How can I make the NLTK sentence_score
return correct results?
from NLTK sentence_bleu() returns 0 while evaluating Chinese sentences
No comments:
Post a Comment