megagonlabs/transformers-ud-japanese-electra-base-ginza-510
transformers-ud-japanese-electra-ginza-510 (sudachitra-wordpiece, mC4 Japanese)
This is an ELECTRA model pretrained on approximately 200M Japanese sentences extracted from the mC4 and finetuned by spaCy v3 on UD_Japanese_BCCWJ r2.8.
The base pretrain model is megagonlabs/transformers-ud-japanese-electra-base-discrimininator.
The entire spaCy v3 model is distributed as a python package named ja_ginza_electra
from PyPI along with GiNZA v5
which provides some custom pipeline components to recognize the Japanese bunsetu-phrase structures.
Try running it as below:
pip install ginza ja_ginza_electra
ginza
Licenses
The models are distributed under the terms of the MIT License.
Acknowledgments
This model is permitted to be published under the MIT License
under a joint research agreement between NINJAL (National Institute for Japanese Language and Linguistics) and Megagon Labs Tokyo.
Citations
- mC4
Contains information from mC4
which is made available under the ODC Attribution License.
@article{2019t5,
author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
journal = {arXiv e-prints},
year = {2019},
archivePrefix = {arXiv},
eprint = {1910.10683},
}
- UD_Japanese_BCCWJ r2.8
Asahara, M., Kanayama, H., Tanaka, T., Miyao, Y., Uematsu, S., Mori, S.,
Matsumoto, Y., Omura, M., & Murawaki, Y. (2018).
Universal Dependencies Version 2 for Japanese.
In LREC-2018.
- GSK2014-A(2019)
数据统计
数据评估
本站Ai导航提供的megagonlabs/transformers-ud-japanese-electra-base-ginza-510都来源于网络,不保证外部链接的准确性和完整性,同时,对于该外部链接的指向,不由Ai导航实际控制,在2023年5月9日 下午7:11收录时,该网页上的内容,都属于合规合法,后期网页的内容如出现违规,可以直接联系网站管理员进行删除,Ai导航不承担任何责任。