There're a set of models on huggingface hubs that comes from the sentence_transformers
library, e.g. https://huggingface.co/cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
The suggested usage examples are:
# Using sentence_transformers
from sentence_transformers import CrossEncoder
model_name = 'cross-encoder/mmarco-mMiniLMv2-L12-H384-v1'
model = CrossEncoder(model_name)
scores = model.predict([
['How many people live in Berlin?', 'How many people live in Berlin?'],
['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.']
])
scores
[out]:
array([ 0.36782095, -4.2674575 ], dtype=float32)
Or
# From transformers.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
import torch
# cross-encoder/ms-marco-MiniLM-L-12-v2
model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/mmarco-mMiniLMv2-L12-H384-v1')
tokenizer = AutoTokenizer.from_pretrained('cross-encoder/mmarco-mMiniLMv2-L12-H384-v1')
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'],
['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],
padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
scores = model(**features).logits
print(scores)
[out]:
tensor([[10.7615],
[-8.1277]])
If a user wants to use the transformers.pipeline
on these cross-encoder model, it throws an error:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
import torch
# cross-encoder/ms-marco-MiniLM-L-12-v2
model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/mmarco-mMiniLMv2-L12-H384-v1')
tokenizer = AutoTokenizer.from_pretrained('cross-encoder/mmarco-mMiniLMv2-L12-H384-v1')
pipe = pipeline(model=model, tokenizer=tokenizer)
It throws an error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_108/785368641.py in <module>
----> 1 pipe = pipeline(model=model, tokenizer=tokenizer)
/opt/conda/lib/python3.7/site-packages/transformers/pipelines/__init__.py in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, use_auth_token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
711 if not isinstance(model, str):
712 raise RuntimeError(
--> 713 "Inferring the task automatically requires to check the hub with a model_id defined as a `str`."
714 f"{model} is not a valid model_id."
715 )
RuntimeError: Inferring the task automatically requires to check the hub with a model_id defined as a `str`.
Q: How to use cross-encoder with Huggingface transformers pipeline?
Q: If a model_id is needed, is it possible to add the model_id as an args
or kwargs
in pipeline
?
There's a similar question Error: Inferring the task automatically requires to check the hub with a model_id defined as a `str`. AraBERT model but I'm not sure it's the same issue, since the other question is on 'aubmindlab/bert-base-arabertv02'
but not the cross-encoder class of models from sentence_transformers
.
from How to use cross-encoder with Huggingface transformers pipeline?
No comments:
Post a Comment