File size: 2,131 Bytes

18454a3
f9806ca
d62cf2f
 
 
edc708a
 
18454a3
f9806ca
d62cf2f
f9806ca
d62cf2f
f9806ca
d62cf2f
 
 
 
 
 
 
 
 
f9806ca
edc708a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b6f0e6

#NLP-Sentiment-Analysis-Airline-Tweets-with-BERT-V2

This repository features sentiment analysis projects that leverage BERT, a leading NLP model.
This project involves pre-processing, tokenization, and BERT customization for airline tweet sentiment classification.
The tasks in this model use the original model "BERT base model (no casing)",
uses a data set: https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment, 
and there are several stages in achieving results, below are the evaluation sets
  Accuracy: 0.8203551912568307

  Colab notebook for improvements: https://colab.research.google.com/drive/1IQen2iNXkjOgdzjyi7PQyLFqHyqHTF3A?usp=sharing

  #classification report for more detailed evaluation :

|           | precision | recall | f1-score | support |
|-----------|-----------|--------|----------|---------|
| negative  |   0.88    |  0.90  |   0.89   |   959   |
| neutral   |   0.68    |  0.58  |   0.62   |   293   |
| positive  |   0.72    |  0.81  |   0.76   |   212   |
|-----------|-----------|--------|----------|---------|
| accuracy  |           |        |   0.82   |  1464   |
| macro avg |   0.76    |  0.76  |   0.76   |  1464   |
| weighted avg | 0.82  |  0.82  |   0.82   |  1464   |

The sentiment classification model achieved a promising
overall accuracy of 82.04%, built on BertForSequenceClassifi-
cation and trained for 10 epochs using AdamW optimization.
The model exhibited stable performance, with validation ac-
curacy consistently between 0.79 to 0.81, indicating effective
learning. Additionally, it showed high precision, particularly
for negative sentiment (0.88), along with moderate scores for
neutral (0.68) and positive (0.72) sentiments. These results
were supported by recall and F1-score metrics, providing a
comprehensive understanding of performance across sentiment
classes. The analysis of the confusion matrix revealed strong
alignment between model predictions and actual labels, al-
beit with opportunities for improvement, such as addressing
overfitting or parameter adjustment, evident from performance
fluctuations across epochs.

Developed by:Mastika