albarpambagio's picture
Update README.md
3abe656 verified
|
raw
history blame
2.85 kB
metadata
license: apache-2.0
tags:
  - generated_from_trainer
base_model: indolem/indobertweet-base-uncased
metrics:
  - accuracy
  - precision
  - recall
  - f1
model-index:
  - name: er-model
    results: []
datasets:
  - SEACrowd/prdect_id
language:
  - id
widget:
  - text: Ini toko korup.,ga sesuai sama isinya..not recommended
    example_title: Contoh

indobertweet-base-uncased-emotion-recognition

Model description

This model is a fine-tuned version of indolem/indobertweet-base-uncased on The PRDECT-ID Dataset, it is a compilation of Indonesian product reviews that come with emotion and sentiment labels. These reviews were gathered from one of Indonesia's largest e-commerce platforms, Tokopedia. It achieves the following results on the evaluation set:

  • Loss: 0.6762
  • Accuracy: 0.6981
  • Precision: 0.7022
  • Recall: 0.6981
  • F1: 0.6963

It has been trained to classify text into six different emotion categories: happy, sadness, anger, love, and fear.

Training and evaluation data

I split my dataframe df into training, validation, and testing sets (train_df, val_df, test_df) using the train_test_split function from sklearn.model_selection. I set the test size to 20% for the initial split and further divided the remaining data equally between validation and testing sets. This process ensures that each split (val_df and test_df) maintains the same class distribution as the original dataset (stratify=df['label']).

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
0.7817 1.0 266 0.6859 0.7057 0.7140 0.7057 0.7061
0.6052 2.0 532 0.6762 0.6981 0.7022 0.6981 0.6963
0.488 3.0 798 0.7251 0.7189 0.7208 0.7189 0.7192
0.3578 4.0 1064 0.7943 0.7208 0.7240 0.7208 0.7222
0.2887 5.0 1330 0.8250 0.7038 0.7093 0.7038 0.7056

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.19.2
  • Tokenizers 0.19.1