Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# XLM-RoBERTa-large-sag
|
2 |
+
|
3 |
+
## Model description
|
4 |
+
|
5 |
+
This is a model based on the [XLM-RoBERTa large](https://huggingface.co/xlm-roberta-large) topology (provided by Facebook, see original [paper](https://arxiv.org/abs/1911.02116)) with additional training on two sets of medicine-domain texts:
|
6 |
+
* 250.000 text reviews on medicines (1000-tokens-long in average) collected from the site irecommend.ru;
|
7 |
+
* the raw part of the [RuDReC corpus](https://github.com/cimm-kzn/RuDReC) (about 1.4 million texts, see [paper](https://arxiv.org/abs/2004.03659)).
|
8 |
+
|
9 |
+
The XLM-RoBERTa-large calculations for one epoch on this data were performed using one Nvidia Tesla v100 and the Huggingface Transformers library.
|
10 |
+
|
11 |
+
## BibTeX entry and citation info
|
12 |
+
|
13 |
+
If you have found our results helpful in your work, feel free to cite our publication as:
|
14 |
+
|
15 |
+
```
|
16 |
+
@article{sboev2021analysis,
|
17 |
+
title={An analysis of full-size Russian complexly NER labelled corpus of Internet user reviews on the drugs based on deep learning and language neural nets},
|
18 |
+
author={Sboev, Alexander and Sboeva, Sanna and Moloshnikov, Ivan and Gryaznov, Artem and Rybka, Roman and Naumov, Alexander and Selivanov, Anton and Rylkov, Gleb and Ilyin, Viacheslav},
|
19 |
+
journal={arXiv preprint arXiv:2105.00059},
|
20 |
+
year={2021}
|
21 |
+
}
|
22 |
+
```
|