language: ky | |
datasets: | |
- wikiann | |
examples: | |
widget: | |
- text: "Бириккен Улуттар Уюму" | |
example_title: "Sentence_1" | |
- text: "Жусуп Мамай" | |
example_title: "Sentence_2" | |
<h1>Kyrgyz Named Entity Recognition</h1> | |
Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Kyrgyz language. | |
WARNING: this model is not usable (see metrics below). I'll update the model after cleaning up the Wikiann dataset and re-training. | |
## Label ID and its corresponding label name | |
| Label ID | Label Name| | |
| -------- | ----- | | |
| 0 | O | | |
| 1 | B-PER | | |
| 2 | I-PER | | |
| 3 | B-ORG| | |
| 4 | I-ORG | | |
| 5 | B-LOC | | |
| 6 | I-LOC | | |
<h1>Results</h1> | |
| Name | Overall F1 | LOC F1 | ORG F1 | PER F1 | | |
| ---- | -------- | ----- | ---- | ---- | | |
| Train set | 0.595683 | 0.570312 | 0.687179 | 0.549180 | | |
| Validation set | 0.461333 | 0.551181 | 0.401913 | 0.425087 | | |
| Test set | 0.442622 | 0.456852 | 0.469565 | 0.413114 | | |
Example | |
```py | |
from transformers import AutoTokenizer, AutoModelForTokenClassification | |
from transformers import pipeline | |
tokenizer = AutoTokenizer.from_pretrained("murat/kyrgyz_language_NER") | |
model = AutoModelForTokenClassification.from_pretrained("murat/kyrgyz_language_NER") | |
nlp = pipeline("ner", model=model, tokenizer=tokenizer) | |
example = "Жусуп Мамай" | |
ner_results = nlp(example) | |
ner_results | |
``` | |