deprem-ml
/

deprem-loodos-bert-base-uncased

@@ -3,6 +3,69 @@ license: apache-2.0
 ---
 ### Deprem NER Training Results
 ```
 training_args = TrainingArguments(
     output_dir="./output",
@@ -15,24 +78,19 @@ training_args = TrainingArguments(
 )
 ```
-Threshold: 0.1
 ```
-              precision    recall  f1-score   support
-    Alakasiz       0.92      0.87      0.89       734
-     Barinma       0.87      0.79      0.83       207
-  Elektronik       0.72      0.73      0.73       130
-       Giysi       0.84      0.66      0.74        94
-    Kurtarma       0.84      0.80      0.82       362
-    Lojistik       0.75      0.51      0.61       112
-      Saglik       0.79      0.80      0.79       108
-          Su       0.63      0.47      0.54        78
-       Yagma       0.75      0.58      0.65        31
-       Yemek       0.80      0.77      0.79       117
-   micro avg       0.85      0.78      0.81      1973
-   macro avg       0.79      0.70      0.74      1973
-weighted avg       0.84      0.78      0.81      1973
- samples avg       0.84      0.82      0.82      1973
 ```

 ---
 ### Deprem NER Training Results
+```
+              precision    recall  f1-score   support
+           0       0.85      0.91      0.88       734
+           1       0.77      0.84      0.80       207
+           2       0.71      0.88      0.79       130
+           3       0.68      0.76      0.72        94
+           4       0.80      0.85      0.82       362
+           5       0.63      0.59      0.61       112
+           6       0.73      0.82      0.77       108
+           7       0.55      0.77      0.64        78
+           8       0.65      0.71      0.68        31
+           9       0.70      0.85      0.76       117
+   micro avg       0.77      0.85      0.81      1973
+   macro avg       0.71      0.80      0.75      1973
+weighted avg       0.77      0.85      0.81      1973
+ samples avg       0.82      0.87      0.83      1973
+```
+### Preprocessing Funcs
+```
+tr_stopwords = stopwords.words('turkish')
+tr_stopwords.append("hic")
+tr_stopwords.append("dm")
+tr_stopwords.append("vs")
+tr_stopwords.append("ya")
+def remove_punct(tok):
+  tok = re.sub(r'[^\w\s]', '', tok)
+  return tok
+def normalize(tok):
+  if tok.isdigit():
+    tok = "digit"
+  return tok
+def clean(tok):
+  tok = remove_punct(tok)
+  tok = normalize(tok)
+  return tok
+def exceptions(tok):
+  if not tok.isdigit() and len(tok)==1:
+    return False
+  if not tok:
+    return False
+  if tok in tr_stopwords:
+    return False
+  if tok.startswith('#') or tok.startswith("@"):
+    return False
+  return True
+sm_tok = lambda text: [clean(tok) for tok in text.split(" ") if exceptions(tok)]
+```
+### Other HyperParams
 ```
 training_args = TrainingArguments(
     output_dir="./output",
 )
 ```
 ```
+class_weights[0] = 1.0
+class_weights[1] = 1.5167249178108022
+class_weights[2] = 1.7547338578655642
+class_weights[3] = 1.9610520059358458
+class_weights[4] = 1.269341370129623
+class_weights[5] = 1.8684086209021484
+class_weights[6] = 1.8019018017117145
+class_weights[7] = 2.110648663094536
+class_weights[8] = 3.081208739200435
+class_weights[9] = 1.7994815143101963
+```
+Threshold: 0.25
 ```