bert-ner-turkish-cased
This model is a fine-tuned version of dbmdz/bert-base-turkish-cased on a custom Turkish NER dataset. It achieves the following results on the evaluation set:
- Loss: 0.0987
- Precision: 0.9112
- Recall: 0.9364
- F1: 0.9236
- Accuracy: 0.9600
Model description
This model identifies named entities in Turkish text:
LABELS = [
"O", "B-PER", "I-PER", "B-LOC", "I-LOC", "B-ORG", "I-ORG",
"B-DATE", "I-DATE", "B-MONEY", "I-MONEY", "B-MISC", "I-MISC"
]
- PER: Person
- LOC: Location
- ORG: Organization
- DATE: Date
- MONEY: Money
- MISC: Miscellaneous Entities
Intended uses & limitations
Extracting entities from Turkish text in NLP pipelines.
How to Use
from transformers import pipeline
model_name = "yeniguno/bert-ner-turkish-cased"
ner_pipeline = pipeline("ner", model=model_name, tokenizer=model_name, aggregation_strategy="simple")
text = """Selim Parlak, 2023-11-15 tarihinde, CUMHURİYET MAH. DUMAN SOKAK 22500 HAVSA/EDİRNE adresinden, Dünya Varlık Yönetim A.Ş. aracılığıyla 850 TRY değerindeki MP.2386.JPA.IP5.WHT.I İPHONE5 ŞARJLI KILIF "AİR" 1700 MAH (BEYAZ) ürününü satın aldı."""
results = ner_pipeline(text)
for result in results:
print(result)
"""
{'entity_group': 'PER', 'score': 0.9993254, 'word': 'Selim Parlak', 'start': 0, 'end': 12}
{'entity_group': 'DATE', 'score': 0.9987677, 'word': '2023 - 11 - 15', 'start': 14, 'end': 24}
{'entity_group': 'LOC', 'score': 0.99951524, 'word': 'CUMHURİYET MAH. DUMAN SOKAK 22500 HAVSA / EDİRNE', 'start': 36, 'end': 82}
{'entity_group': 'ORG', 'score': 0.8487069, 'word': 'Dünya Varlık Yönetim A. Ş.', 'start': 95, 'end': 120}
{'entity_group': 'MONEY', 'score': 0.9970985, 'word': '850 TRY', 'start': 134, 'end': 141}
{'entity_group': 'MISC', 'score': 0.97721404, 'word': 'MP. 2386. JPA. IP5. WHT. I İPHONE5 ŞARJLI KILIF " AİR " 1700 MAH ( BEYAZ )', 'start': 154, 'end': 219}
"""
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 20
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|
0.1351 | 1.0 | 1527 | 0.1158 | 0.8592 | 0.9070 | 0.8825 | 0.9517 |
0.1088 | 2.0 | 3054 | 0.1045 | 0.8787 | 0.9336 | 0.9053 | 0.9574 |
0.1016 | 3.0 | 4581 | 0.0993 | 0.8901 | 0.9280 | 0.9086 | 0.9576 |
0.1102 | 4.0 | 6108 | 0.0963 | 0.8991 | 0.9277 | 0.9132 | 0.9587 |
0.0877 | 5.0 | 7635 | 0.0953 | 0.9046 | 0.9292 | 0.9167 | 0.9584 |
0.0933 | 6.0 | 9162 | 0.0939 | 0.9036 | 0.9321 | 0.9176 | 0.9593 |
0.0827 | 7.0 | 10689 | 0.0967 | 0.8986 | 0.9398 | 0.9188 | 0.9605 |
0.0933 | 8.0 | 12216 | 0.0949 | 0.9122 | 0.9292 | 0.9206 | 0.9593 |
0.084 | 9.0 | 13743 | 0.0987 | 0.9112 | 0.9364 | 0.9236 | 0.9600 |
Framework versions
- Transformers 4.47.0
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 50
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for yeniguno/bert-ner-turkish-cased
Base model
dbmdz/bert-base-turkish-cased