This model is the finetuned version of the pre-trained contriever model available here https://huggingface.co/facebook/contriever, following the approach described in Towards Unsupervised Dense Information Retrieval with Contrastive Learning. The associated GitHub repository is available here https://github.com/facebookresearch/contriever.
Usage (HuggingFace Transformers)
Using the model directly available in HuggingFace transformers requires to add a mean pooling operation to obtain a sentence embedding.
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('facebook/contriever-msmarco')
model = AutoModel.from_pretrained('facebook/contriever-msmarco')
sentences = [
"Where was Marie Curie born?",
"Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867.",
"Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."
]
# Apply tokenizer
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
outputs = model(**inputs)
# Mean pooling
def mean_pooling(token_embeddings, mask):
token_embeddings = token_embeddings.masked_fill(~mask[..., None].bool(), 0.)
sentence_embeddings = token_embeddings.sum(dim=1) / mask.sum(dim=1)[..., None]
return sentence_embeddings
embeddings = mean_pooling(outputs[0], inputs['attention_mask'])
数据统计
数据评估
关于facebook/contriever-msmarco特别声明
本站Ai导航提供的facebook/contriever-msmarco都来源于网络,不保证外部链接的准确性和完整性,同时,对于该外部链接的指向,不由Ai导航实际控制,在2023年5月9日 下午7:17收录时,该网页上的内容,都属于合规合法,后期网页的内容如出现违规,可以直接联系网站管理员进行删除,Ai导航不承担任何责任。
相关导航
暂无评论...