|
**Train-Test Set:** "teknofest_train_final.csv" |
|
|
|
**Model:** "dbmdz/bert-base-turkish-128k-uncased" |
|
|
|
**Önişleme** |
|
- Büyük karakterler öncesine special token (#) eklenip sonrasında karakterler küçültülmüştür |
|
- Noktalama işaretleri silinmiştir |
|
|
|
## Tokenizer Parametreleri |
|
``` |
|
max_length=64 |
|
padding=True |
|
truncation=True |
|
``` |
|
|
|
## Eğitim Parametreleri |
|
- **Epoch:** 3 |
|
- **Learning Rate:** 7e-5 |
|
- **Batch-Size:** 64 |
|
- **Tokenizer Length:** 64 |
|
- **Loss:** BCE |
|
- **Online Hard Example Mining:** Açık |
|
- **Class-Weighting:** Açık (^0.3) |
|
- **Early Stopping:** Kapalı |
|
- **Stratified Batch Sampling:** Açık |
|
- **Gradient Accumulation:** Kapalı |
|
- **LR Scheduler:** Cosine-with-Warmup |
|
- **Warmup Ratio:** 0.1 |
|
- **Weight Decay:** 0.01 |
|
- **LLRD:** 0.95 |
|
- **Label Smoothing:** 0.05 |
|
- **Gradient Clipping:** 1.0 |
|
- **MLM Pre-Training:** Kapalı |
|
|
|
|
|
## CV10 Sonuçları |
|
``` |
|
precision recall f1-score support |
|
|
|
INSULT 0.9172 0.9260 0.9216 2393 |
|
OTHER 0.9681 0.9646 0.9663 3528 |
|
PROFANITY 0.9627 0.9571 0.9599 2376 |
|
RACIST 0.9684 0.9651 0.9667 2033 |
|
SEXIST 0.9618 0.9668 0.9643 2081 |
|
|
|
accuracy 0.9562 12411 |
|
macro avg 0.9557 0.9559 0.9558 12411 |
|
weighted avg 0.9563 0.9562 0.9562 12411 |
|
``` |