merve's picture
merve HF staff
Update README.md
cbade61
metadata
license: apache-2.0
language:
  - tr
tags:
  - deprem-clf-v1
metrics:
  - accuracy
  - recall
  - f1
library_name: transformers
pipeline_tag: text-classification
model-index:
  - name: deprem_v12
    results:
      - task:
          type: text-classification
        dataset:
          type: deprem_private_dataset_v1_2
          name: deprem_private_dataset_v1_2
        metrics:
          - type: recall
            value: 0.8
            verified: false
          - type: f1
            value: 0.75
            verified: false

Deprem NER Training Results

              precision    recall  f1-score   support

           0       0.85      0.91      0.88       734
           1       0.77      0.84      0.80       207
           2       0.71      0.88      0.79       130
           3       0.68      0.76      0.72        94
           4       0.80      0.85      0.82       362
           5       0.63      0.59      0.61       112
           6       0.73      0.82      0.77       108
           7       0.55      0.77      0.64        78
           8       0.65      0.71      0.68        31
           9       0.70      0.85      0.76       117

   micro avg       0.77      0.85      0.81      1973
   macro avg       0.71      0.80      0.75      1973
weighted avg       0.77      0.85      0.81      1973
 samples avg       0.82      0.87      0.83      1973

Preprocessing Funcs

tr_stopwords = stopwords.words('turkish')
tr_stopwords.append("hic")
tr_stopwords.append("dm")
tr_stopwords.append("vs")
tr_stopwords.append("ya")

def remove_punct(tok):
  tok = re.sub(r'[^\w\s]', '', tok)
  return tok

def normalize(tok):
  if tok.isdigit():
    tok = "digit"
  return tok

def clean(tok):
  tok = remove_punct(tok)
  tok = normalize(tok)

  return tok

def exceptions(tok):
  if not tok.isdigit() and len(tok)==1:
    return False

  if not tok:
    return False

  if tok in tr_stopwords:
    return False

  if tok.startswith('#') or tok.startswith("@"):
    return False
  
  return True


sm_tok = lambda text: [clean(tok) for tok in text.split(" ") if exceptions(tok)]

Other HyperParams

training_args = TrainingArguments(
    output_dir="./output",
    evaluation_strategy="epoch",
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    weight_decay=0.01,
    report_to=None,
    num_train_epochs=4
)
class_weights[0] = 1.0
class_weights[1] = 1.5167249178108022
class_weights[2] = 1.7547338578655642
class_weights[3] = 1.9610520059358458
class_weights[4] = 1.269341370129623
class_weights[5] = 1.8684086209021484
class_weights[6] = 1.8019018017117145
class_weights[7] = 2.110648663094536
class_weights[8] = 3.081208739200435
class_weights[9] = 1.7994815143101963

Threshold: 0.25