Multi-lingual sentiment prediction trained from COVID19-related tweets

Repository: https://github.com/clampert/multilingual-sentiment-analysis/

Model trained on a large-scale (18437530 examples) dataset of multi-lingual tweets that was collected between March 2020 and November 2021 using Twitter’s Streaming API with varying COVID19-related keywords. Labels were auto-general based on the presence of positive and negative emoticons. For details on the dataset, see our IEEE BigData 2021 publication.

Base model is sentence-transformers/stsb-xlm-r-multilingual. It was finetuned for sequence classification with positive and negative labels for two epochs (48 hours on 8xP100 GPUs).

Citation

If you use our model your work, please cite:

@inproceedings{lampert2021overcoming,
  title={Overcoming Rare-Language Discrimination in Multi-Lingual Sentiment Analysis},
  author={Jasmin Lampert and Christoph H. Lampert},
  booktitle={IEEE International Conference on Big Data (BigData)},
  year={2021},
  note={Special Session: Machine Learning on Big Data},
}

Enjoy!

Downloads last month
84
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.