KoichiYasuoka's picture
base_model
9ea0167
metadata
language:
  - ain
tags:
  - ainu
  - token-classification
  - pos
  - dependency-parsing
base_model: KoichiYasuoka/roberta-base-ainu
license: cc-by-sa-4.0
pipeline_tag: token-classification
widget:
  - text: itak=as awa pon rupne aynu ene itaki
  - text: イタカㇱ アワ ポン ルㇷ゚ネ アイヌ エネ イタキ
  - text: итакас ава пон рубне айну эне итакі

roberta-base-ainu-upos

Model Description

This is a RoBERTa model pre-trained on Ainu texts (in カタカナ, Roman, and Кириллица) for POS-tagging and dependency-parsing, derived from roberta-base-ainu. Every word is tagged by UPOS (Universal Part-Of-Speech).

How to Use

from transformers import AutoTokenizer,AutoModelForTokenClassification
tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/roberta-base-ainu-upos")
model=AutoModelForTokenClassification.from_pretrained("KoichiYasuoka/roberta-base-ainu-upos")

or

import esupar
nlp=esupar.load("KoichiYasuoka/roberta-base-ainu-upos","ainu")

Reference

安岡孝一: ローマ字・カタカナ・キリル文字併用アイヌ語RoBERTa・DeBERTaモデルの開発, 情報処理学会研究報告, Vol.2023-CH-131『人文科学とコンピュータ』, No.7 (2023年2月18日), pp.1-7.

See Also

esupar: Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models