seara commited on
Commit
dd9f2a3
·
1 Parent(s): bd71104

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ru
5
+ metrics:
6
+ - f1
7
+ - roc_auc
8
+ - precision
9
+ - recall
10
+ pipeline_tag: text-classification
11
+ tags:
12
+ - rubert
13
+ - sentiment
14
+ datasets:
15
+ - sismetanin/rureviews
16
+ - RuSentiment
17
+ - LinisCrowd2015
18
+ - LinisCrowd2016
19
+ - KaggleRussianNews
20
+ ---
21
+
22
+ This is [RuBERT-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) model fine-tuned for __sentiment classification__ of short __Russian__ texts.
23
+ The task is a __multi-class classification__ with the following labels:
24
+
25
+ ```yaml
26
+ 0: neutral
27
+ 1: positive
28
+ 2: negative
29
+ ```
30
+
31
+ ## Usage
32
+
33
+ ```python
34
+ from transformers import pipeline
35
+ model = pipeline(model="seara/rubert-tiny2-russian-sentiment")
36
+ model("Привет, ты мне нравишься!")
37
+ # [{'label': 'positive', 'score': 0.9398769736289978}]
38
+ ```
39
+
40
+ ## Dataset
41
+
42
+ This model was trained on the union of the following datasets:
43
+
44
+ - Kaggle Russian News Dataset
45
+ - Linis Crowd 2015
46
+ - Linis Crowd 2016
47
+ - RuReviews
48
+ - RuSentiment
49
+
50
+ An overview of the training data can be found on [S. Smetanin Github repository](https://github.com/sismetanin/sentiment-analysis-in-russian).
51
+
52
+ __Download links for all Russian sentiment datasets collected by Smetanin can be found in this [repository](https://github.com/searayeah/russian-sentiment-emotions-datasets).__
53
+
54
+ ## Training
55
+
56
+ Training were done in this [project](https://github.com/searayeah/vkr-bert) with this parameters:
57
+
58
+ ```yaml
59
+ max_length: 512
60
+ batch_size: 64
61
+ optimizer: adam
62
+ lr: 0.00001
63
+ weight_decay: 0
64
+ num_epochs: 5
65
+ ```
66
+
67
+ Train/validation/test splits are 80%/10%/10%.
68
+
69
+ ## Eval results (on test split)
70
+
71
+
72
+ | |neutral|positive|negative|macro avg|weighted avg|
73
+ |---------|-------|--------|--------|---------|------------|
74
+ |precision|0.69 |0.83 |0.74 |0.75 |0.75 |
75
+ |recall |0.73 |0.82 |0.68 |0.75 |0.75 |
76
+ |f1-score |0.71 |0.83 |0.71 |0.75 |0.75 |
77
+ |support |5196 |3831 |3599 |12626 |12626 |
78
+ |auc-roc |0.84 |0.95 |0.90 |0.90 |0.89 |
79
+
80
+