dpr-ctx_encoder-bert-base-multilingual
Description
Multilingual DPR Model base on bert-base-multilingual-cased.
DPR model
DPR repo
Data
question pairs for train
: 644,217question pairs for dev
: 73,710
*DRCD and MLQA are converted using script from haystack squad_to_dpr.py
Training Script
I use the script from haystack
Usage
from Transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer
tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('voidful/dpr-question_encoder-bert-base-multilingual')
model = DPRQuestionEncoder.from_pretrained('voidful/dpr-question_encoder-bert-base-multilingual')
input_ids = tokenizer("Hello, is my dog cute ?", return_tensors='pt')["input_ids"]
embeddings = model(input_ids).pooler_output
Follow the tutorial from haystack
:
Better Retrievers via “Dense Passage Retrieval”
from haystack.retriever.dense import DensePassageRetriever
retriever = DensePassageRetriever(document_store=document_store,
query_embedding_model="voidful/dpr-question_encoder-bert-base-multilingual",
passage_embedding_model="voidful/dpr-ctx_encoder-bert-base-multilingual",
max_seq_len_query=64,
max_seq_len_passage=256,
batch_size=16,
use_gpu=True,
embed_title=True,
use_fast_tokenizers=True)
数据统计
数据评估
关于voidful/dpr-question_encoder-bert-base-multilingual特别声明
本站Ai导航提供的voidful/dpr-question_encoder-bert-base-multilingual都来源于网络,不保证外部链接的准确性和完整性,同时,对于该外部链接的指向,不由Ai导航实际控制,在2023年5月9日 下午7:14收录时,该网页上的内容,都属于合规合法,后期网页的内容如出现违规,可以直接联系网站管理员进行删除,Ai导航不承担任何责任。
相关导航
暂无评论...