File size: 2,265 Bytes
380d502 45ca793 5831a16 9b71d6d 45ca793 5831a16 45ca793 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
license: apache-2.0
---
### Deprem NER Training Results
```
precision recall f1-score support
0 0.85 0.91 0.88 734
1 0.77 0.84 0.80 207
2 0.71 0.88 0.79 130
3 0.68 0.76 0.72 94
4 0.80 0.85 0.82 362
5 0.63 0.59 0.61 112
6 0.73 0.82 0.77 108
7 0.55 0.77 0.64 78
8 0.65 0.71 0.68 31
9 0.70 0.85 0.76 117
micro avg 0.77 0.85 0.81 1973
macro avg 0.71 0.80 0.75 1973
weighted avg 0.77 0.85 0.81 1973
samples avg 0.82 0.87 0.83 1973
```
### Preprocessing Funcs
```
tr_stopwords = stopwords.words('turkish')
tr_stopwords.append("hic")
tr_stopwords.append("dm")
tr_stopwords.append("vs")
tr_stopwords.append("ya")
def remove_punct(tok):
tok = re.sub(r'[^\w\s]', '', tok)
return tok
def normalize(tok):
if tok.isdigit():
tok = "digit"
return tok
def clean(tok):
tok = remove_punct(tok)
tok = normalize(tok)
return tok
def exceptions(tok):
if not tok.isdigit() and len(tok)==1:
return False
if not tok:
return False
if tok in tr_stopwords:
return False
if tok.startswith('#') or tok.startswith("@"):
return False
return True
sm_tok = lambda text: [clean(tok) for tok in text.split(" ") if exceptions(tok)]
```
### Other HyperParams
```
training_args = TrainingArguments(
output_dir="./output",
evaluation_strategy="epoch",
per_device_train_batch_size=32,
per_device_eval_batch_size=32,
weight_decay=0.01,
report_to=None,
num_train_epochs=4
)
```
```
class_weights[0] = 1.0
class_weights[1] = 1.5167249178108022
class_weights[2] = 1.7547338578655642
class_weights[3] = 1.9610520059358458
class_weights[4] = 1.269341370129623
class_weights[5] = 1.8684086209021484
class_weights[6] = 1.8019018017117145
class_weights[7] = 2.110648663094536
class_weights[8] = 3.081208739200435
class_weights[9] = 1.7994815143101963
```
Threshold: 0.25
``` |