File size: 2,796 Bytes
eb31b54 332eb14 eb31b54 ea80b69 eb31b54 6863a7e 6e02b29 eb31b54 2dfc11c eb31b54 2dfc11c eb31b54 a8c126a 5a998c0 5379057 5a998c0 a8c126a 62e8c0c eb31b54 2dfc11c eb31b54 2dfc11c eb31b54 2dfc11c 37df51c 2dfc11c eb31b54 768d0d8 d96765c eb31b54 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
license: mit
language:
- ja
tags:
- generated_from_trainer
- ner
- bert
metrics:
- f1
model-index:
- name: xlm-roberta-ner-ja
results: []
widget:
- text: "鈴木は4月の陽気の良い日に、鈴をつけて熊本県の阿蘇山に登った"
- text: "中国では、中国共産党による一党統治が続く"
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# xlm-roberta-ner-ja
(Japanese caption : 日本語の固有表現抽出のモデル)
This model is a fine-tuned NER (named entity recognition) token classification model of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) (pre-trained cross-lingual ```RobertaModel```) on Wikipedia Japanese NER dataset by Stockmark Inc.<br>
See [here](https://github.com/stockmarkteam/ner-wikipedia-dataset) for the license of this dataset.
Each token is labeled by :
| Label id | Tag | Tag in Widget | Description |
|---|---|---|---|
| 0 | O | (None) | others or nothing |
| 1 | PER | PER | person |
| 2 | ORG | ORG | general corporation organization |
| 3 | ORG-P | P | political organization |
| 4 | ORG-O | O | other organization |
| 5 | LOC | LOC | location |
| 6 | INS | INS | institution, facility |
| 7 | PRD | PRD | product |
| 8 | EVT | EVT | event |
## Intended uses
```python
from transformers import AutoModelForTokenClassification
from transformers import pipeline
model_name = "tsmatz/xlm-roberta-ner-ja"
model = AutoModelForTokenClassification.from_pretrained(model_name)
classifier = pipeline("token-classification", model=model_name)
result = classifier("鈴木は4月の陽気の良い日に、鈴をつけて熊本県の阿蘇山に登った")
print(result)
```
## Training procedure
You can download the source code for fine-tuning from [here](https://github.com/tsmatz/huggingface-finetune-japanese/blob/master/02-summarize.ipynb).
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 12
- eval_batch_size: 12
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5
### Training results
| Training Loss | Epoch | Step | Validation Loss | F1 |
|:-------------:|:-----:|:----:|:---------------:|:------:|
| No log | 1.0 | 446 | 0.1510 | 0.8457 |
| No log | 2.0 | 892 | 0.0626 | 0.9261 |
| No log | 3.0 | 1338 | 0.0366 | 0.9580 |
| No log | 4.0 | 1784 | 0.0196 | 0.9792 |
| No log | 5.0 | 2230 | 0.0173 | 0.9864 |
### Framework versions
- Transformers 4.23.1
- Pytorch 1.12.1+cu102
- Datasets 2.6.1
- Tokenizers 0.13.1
|