Tweets disaster type classification model

This model was trained from part of Disaster Tweet Corpus 2020 (Analysis of Filtering Models for Disaster-Related Tweets, Wiegmann,M. et al, 2020) dataset It achieves the following results on the evaluation set:

  • Train Loss: 0.0875
  • Train Accuracy: 0.8783
  • Validation Loss: 0.2980
  • Validation Accuracy: 0.8133
  • Epoch: 5

Model description

Labels
disease --- 1
earthquake --- 2
flood --- 3
hurricane & tornado --- 4
wildfire --- 5
industrial accident --- 6
societal crime --- 7
transportation accident --- 8
meteor crash --- 9
haze --- 0

Intended uses & limitation

This model is able to detect 10 different type of disaster (nature and human-made), but it shows problem to detect the type 0 disaster due to the insignificant tweets and similarity to type 5 in the training dataset

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer:
    batch_size = 16
    num_epochs = 5
    batches_per_epoch = len(tokenized_tweet["train"])//batch_size
    total_train_steps = int(batches_per_epoch * num_epochs)
    optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
  • training_precision: float32

Framework versions

  • Transformers 4.16.2
  • TensorFlow 2.9.2
  • Datasets 2.4.0
  • Tokenizers 0.12.1

How to use it

from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("sacculifer/dimbat_disaster_type_distilbert")

model = TFAutoModelForSequenceClassification.from_pretrained("sacculifer/dimbat_disaster_type_distilbert")

Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.