DmitryPogrebnoy commited on
Commit
6df8eff
·
1 Parent(s): e492a73

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -1
README.md CHANGED
@@ -1,3 +1,55 @@
1
  ---
2
- license: gpl-3.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - ru
4
+ license: apache-2.0
5
  ---
6
+
7
+ # Model MedRuRobertaLarge
8
+
9
+ # Model Description
10
+
11
+ This model is fine-tuned version of [ruRoberta-large](sberbank-ai/ruRoberta-large).
12
+ The code for the fine-tuned process can be found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/spellchecker/ml_ranging/models/med_ru_roberta_large/fine_tune_ru_roberta_large.py).
13
+ The model is fine-tuned on a specially collected dataset of over 30,000 medical anamneses in Russian.
14
+ The collected dataset can be found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/data/anamnesis/processed/all_anamnesis.csv).
15
+
16
+ This model was created as part of a master's project to develop a method for correcting typos
17
+ in medical histories using BERT models as a ranking of candidates.
18
+ The project is open source and can be found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker).
19
+
20
+ # How to Get Started With the Model
21
+
22
+ You can use the model directly with a pipeline for masked language modeling:
23
+
24
+ ```python
25
+ >>> from transformers import pipeline
26
+ >>> pipeline = pipeline('fill-mask', model='DmitryPogrebnoy/MedRuRobertaLarge')
27
+ >>> pipeline("У пациента <mask> боль в грудине.")
28
+ [{'score': 0.2467374950647354,
29
+ 'token': 9233,
30
+ 'token_str': ' сильный',
31
+ 'sequence': 'У пациента сильный боль в грудине.'},
32
+ {'score': 0.16476310789585114,
33
+ 'token': 27876,
34
+ 'token_str': ' постоянный',
35
+ 'sequence': 'У пациента постоянный боль в грудине.'},
36
+ {'score': 0.07211139053106308,
37
+ 'token': 19551,
38
+ 'token_str': ' острый',
39
+ 'sequence': 'У пациента острый боль в грудине.'},
40
+ {'score': 0.0616639070212841,
41
+ 'token': 18840,
42
+ 'token_str': ' сильная',
43
+ 'sequence': 'У пациента сильная боль в грудине.'},
44
+ {'score': 0.029712719842791557,
45
+ 'token': 40176,
46
+ 'token_str': ' острая',
47
+ 'sequence': 'У пациента острая боль в грудине.'}]
48
+ ```
49
+
50
+ Or you can load the model and tokenizer and do what you need to do:
51
+
52
+ ```python
53
+ >>> tokenizer = AutoTokenizer.from_pretrained("DmitryPogrebnoy/MedRuRobertaLarge")
54
+ >>> model = AutoModelForMaskedLM.from_pretrained("DmitryPogrebnoy/MedRuRobertaLarge")
55
+ ```