File size: 2,131 Bytes
18454a3 f9806ca d62cf2f edc708a 18454a3 f9806ca d62cf2f f9806ca d62cf2f f9806ca d62cf2f f9806ca edc708a 3b6f0e6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
#NLP-Sentiment-Analysis-Airline-Tweets-with-BERT-V2
This repository features sentiment analysis projects that leverage BERT, a leading NLP model.
This project involves pre-processing, tokenization, and BERT customization for airline tweet sentiment classification.
The tasks in this model use the original model "BERT base model (no casing)",
uses a data set: https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment,
and there are several stages in achieving results, below are the evaluation sets
Accuracy: 0.8203551912568307
Colab notebook for improvements: https://colab.research.google.com/drive/1IQen2iNXkjOgdzjyi7PQyLFqHyqHTF3A?usp=sharing
#classification report for more detailed evaluation :
| | precision | recall | f1-score | support |
|-----------|-----------|--------|----------|---------|
| negative | 0.88 | 0.90 | 0.89 | 959 |
| neutral | 0.68 | 0.58 | 0.62 | 293 |
| positive | 0.72 | 0.81 | 0.76 | 212 |
|-----------|-----------|--------|----------|---------|
| accuracy | | | 0.82 | 1464 |
| macro avg | 0.76 | 0.76 | 0.76 | 1464 |
| weighted avg | 0.82 | 0.82 | 0.82 | 1464 |
The sentiment classification model achieved a promising
overall accuracy of 82.04%, built on BertForSequenceClassifi-
cation and trained for 10 epochs using AdamW optimization.
The model exhibited stable performance, with validation ac-
curacy consistently between 0.79 to 0.81, indicating effective
learning. Additionally, it showed high precision, particularly
for negative sentiment (0.88), along with moderate scores for
neutral (0.68) and positive (0.72) sentiments. These results
were supported by recall and F1-score metrics, providing a
comprehensive understanding of performance across sentiment
classes. The analysis of the confusion matrix revealed strong
alignment between model predictions and actual labels, al-
beit with opportunities for improvement, such as addressing
overfitting or parameter adjustment, evident from performance
fluctuations across epochs.
Developed by:Mastika
|