sismetanin/xlm_roberta_base-ru-sentiment-rusentiment

XML-RoBERTa-Base-ru-sentiment-RuSentiment

XML-RoBERTa-Base-ru-sentiment-RuSentiment is a XML-RoBERTa-Base model fine-tuned on RuSentiment dataset of general-domain Russian-language posts from the largest Russian social network, VKontakte.

Model	Score	Rank	Dataset
			SentiRuEval-2016						RuSentiment		KRND	LINIS Crowd	RuTweetCorp	RuReviews
			TC			Banks			RuSentiment		KRND	LINIS Crowd	RuTweetCorp	RuReviews
			micro F1	macro F1	F1	micro F1	macro F1	F1	wighted	F1	F1	F1	F1	F1
SOTA	n/s		76.71	66.40	70.68	67.51	69.53	74.06	78.50	n/s	73.63	60.51	83.68	77.44
XLM-RoBERTa-Large	76.37	1	82.26	76.36	79.42	76.35	76.08	80.89	78.31	75.27	75.17	60.03	88.91	78.81
SBERT-Large	75.43	2	78.40	71.36	75.14	72.39	71.87	77.72	78.58	75.85	74.20	60.64	88.66	77.41
MBARTRuSumGazeta	74.70	3	76.06	68.95	73.04	72.34	71.93	77.83	76.71	73.56	74.18	60.54	87.22	77.51
Conversational RuBERT	74.44	4	76.69	69.09	73.11	69.44	68.68	75.56	77.31	74.40	73.10	59.95	87.86	77.78
LaBSE	74.11	5	77.00	69.19	73.55	70.34	69.83	76.38	74.94	70.84	73.20	59.52	87.89	78.47
XLM-RoBERTa-Base	73.60	6	76.35	69.37	73.42	68.45	67.45	74.05	74.26	70.44	71.40	60.19	87.90	78.28
RuBERT	73.45	7	74.03	66.14	70.75	66.46	66.40	73.37	75.49	71.86	72.15	60.55	86.99	77.41
MBART-50-Large-Many-to-Many	73.15	8	75.38	67.81	72.26	67.13	66.97	73.85	74.78	70.98	71.98	59.20	87.05	77.24
SlavicBERT	71.96	9	71.45	63.03	68.44	64.32	63.99	71.31	72.13	67.57	72.54	58.70	86.43	77.16
EnRuDR-BERT	71.51	10	72.56	64.74	69.07	61.44	60.21	68.34	74.19	69.94	69.33	56.55	87.12	77.95
RuDR-BERT	71.14	11	72.79	64.23	68.36	61.86	60.92	68.48	74.65	70.63	68.74	54.45	87.04	77.91
MBART-50-Large	69.46	12	70.91	62.67	67.24	61.12	60.25	68.41	72.88	68.63	70.52	46.39	86.48	77.52

The table shows per-task scores and a macro-average of those scores to determine a models’s position on the leaderboard. For datasets with multiple evaluation metrics (e.g., macro F1 and weighted F1 for RuSentiment), we use an unweighted average of the metrics as the score for the task when computing the overall macro-average. The same strategy for comparing models’ results was applied in the GLUE benchmark.

Citation

If you find this repository helpful, feel free to cite our publication:

@article{Smetanin2021Deep,
  author = {Sergey Smetanin and Mikhail Komarov},
  title = {Deep transfer learning baselines for sentiment analysis in Russian},
  journal = {Information Processing & Management},
  volume = {58},
  number = {3},
  pages = {102484},
  year = {2021},
  issn = {0306-4573},
  doi = {0.1016/j.ipm.2020.102484}
}

Dataset:

@inproceedings{rogers2018rusentiment,
  title={RuSentiment: An enriched sentiment analysis dataset for social media in Russian},
  author={Rogers, Anna and Romanov, Alexey and Rumshisky, Anna and Volkova, Svitlana and Gronas, Mikhail and Gribov, Alex},
  booktitle={Proceedings of the 27th international conference on computational linguistics},
  pages={755--763},
  year={2018}
}