|
from classifier import classify |
|
from PIL import Image |
|
import streamlit as st |
|
|
|
st.title("Twitter Sentiment Analysis using BERT model") |
|
|
|
st.subheader("Motivation") |
|
st.markdown(""" |
|
Social media has significantly shortened the digital world making it easy for fake news to spread like wildfire. |
|
According to official reports, 36.7 percent [6] of the total population have felt that they are being cyberbullied in their lifetime. |
|
Since the level of offensiveness is subjective, conventional sentiment analysis might not do a perfect job in classifying them. |
|
A way to get around this is to use significantly large and diverse Deep Learning datasets that can generalize the model. |
|
|
|
Huggingface spaces provides an easy interfce to test the models before the use. Also, share the models with ease. |
|
""") |
|
|
|
st.subheader("Play with the model") |
|
|
|
text = st.text_input("Enter a tweet to classify it as either Normal or Abusive. (Press enter to submit)", |
|
value="I love DCNM course", max_chars=512, key=None, type="default", |
|
help=None, autocomplete=None) |
|
st.markdown(f"The tweet is classified as: **{classify(text)}**") |
|
|
|
st.markdown("Try out for abusive _Avatar is a crappy movie_") |
|
|
|
st.subheader("About the model") |
|
st.markdown(""" |
|
Model was trained on twitter dataset ENCASEH2020 from Founta, A.M et. al. (2018) [3]. BERT Tiny model [1][2][5] was chosen for this project because, empirically, |
|
giving better result with least number of parameters. The model was trained for 10 epochs with batch size of 32 and AdamW optimizer with learning rate of 1e-2 and loss as cross entropy. |
|
""") |
|
|
|
st.image("./images/train_val_accuracy.png", caption="Train and validation Accuracy - On an average we are getting 96 percent accuracy", use_column_width=True) |
|
st.image("./images/train_test_scores.png", caption="Classification Report - We are getting F1 score of 0.96 for both the classes", use_column_width=True) |
|
st.image("./images/confusion_matrix.png", caption="Confusion Matrix - Only 217 datapoints are mis-classified from 5430 data points in the test dataset", use_column_width=True) |
|
|
|
st.subheader("References") |
|
st.markdown("1. [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)") |
|
st.markdown("2. [BERT-Tiny: A Tiny BERT for Natural Language Understanding](https://arxiv.org/abs/1909.10351)") |
|
st.markdown("3. [Founta, A.M., Djouvas, C., Chatzakou, D., Leontiadis, I., Blackburn, J., Stringhini, G., Vakali, A., Sirivianos, M., & Kourtellis, N. (2018).Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. In 11th International Conference on Web and Social Media, ICWSM 2018.](https://arxiv.org/abs/1802.00393)") |
|
st.markdown("4. [Nandagopan D, Kowsik & Dinesh, Navaneeth & S Ram, Ajay. & C N, Amarnath. (2022). End-to-End Messaging System Enhancement using Federated Learning for Cyberbullying Detection. 10.13140/RG.2.2.35686.70722. ](https://github.com/Cubemet/bert-models)") |
|
st.markdown("5. [Base Model from nreimers](https://huggingface.co/nreimers/BERT-Tiny_L-2_H-128_A-2)") |
|
st.markdown("6. [IHPL, Cyberbullying, a Growing Public Health Concern (Aug 2018)](https://ihpl.llu.edu/blog/cyberbullying-growing-public-health-concern)") |