I have a list of sentences I'm trying to calculate perplexity for, using several models using this code:
from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch
import numpy as np
model_name = 'cointegrated/rubert-tiny'
model = AutoModelForMaskedLM.from_pretrained(model_name).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_name)
def score(model, tokenizer, sentence):
tensor_input = tokenizer.encode(sentence, return_tensors='pt')
repeat_input = tensor_input.repeat(tensor_input.size(-1)-2, 1)
mask = torch.ones(tensor_input.size(-1) - 1).diag(1)[:-2]
masked_input = repeat_input.masked_fill(mask == 1, tokenizer.mask_token_id)
labels = repeat_input.masked_fill( masked_input != tokenizer.mask_token_id, -100)
with torch.inference_mode():
loss = model(masked_input.cuda(), labels=labels.cuda()).loss
return np.exp(loss.item())
print(score(sentence='London is the capital of Great Britain.', model=model, tokenizer=tokenizer))
# 4.541251105675365
Most models work well, but some sentences seem to through an error:
RuntimeError: CUDA out of memory. Tried to allocate 10.34 GiB (GPU 0; 23.69 GiB total capacity; 10.97 GiB already allocated; 6.94 GiB free; 14.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Which makes sense because some are very long. So what I did was to add something like try, except RuntimeError, pass
.
This seemed to work until around 210 sentences, and then it just outputs the error:
CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I found this which had a lot of discussions and ideas, some were regarding potential faulty GPUs? But I know that my GPU works as this exact code works for other models. There's also talk about batch size here, which is why I thought it potentially relates to freeing up memory.
I tried running torch.cuda.empty_cache()
to free the memory like in here after every some epochs but it didn't work (through the same error).
from How to free GPU memory in PyTorch
No comments:
Post a Comment