Model DmitryPogrebnoy/MedDistilBertBaseRuCased

Model Description

This model is fine-tuned version of DmitryPogrebnoy/distilbert-base-russian-cased. The code for the fine-tuned process can be found here. The model is fine-tuned on a specially collected dataset of over 30,000 medical anamneses in Russian. The collected dataset can be found here.

This model was created as part of a master's project to develop a method for correcting typos in medical histories using BERT models as a ranking of candidates. The project is open source and can be found here.

How to Get Started With the Model

You can use the model directly with a pipeline for masked language modeling:

>>> from transformers import pipeline
>>> pipeline = pipeline('fill-mask', model='DmitryPogrebnoy/MedDistilBertBaseRuCased')
>>> pipeline("У пациента [MASK] боль в грудине.")
[{'score': 0.1733243614435196,
  'token': 6880,
  'token_str': 'имеется',
  'sequence': 'У пациента имеется боль в грудине.'},
 {'score': 0.08818087726831436,
  'token': 1433,
  'token_str': 'есть',
  'sequence': 'У пациента есть боль в грудине.'},
 {'score': 0.03620537742972374,
  'token': 3793,
  'token_str': 'особенно',
  'sequence': 'У пациента особенно боль в грудине.'},
 {'score': 0.03438418731093407,
  'token': 5168,
  'token_str': 'бол',
  'sequence': 'У пациента бол боль в грудине.'},
 {'score': 0.032936397939920425,
  'token': 6281,
  'token_str': 'протекает',
  'sequence': 'У пациента протекает боль в грудине.'}]

Or you can load the model and tokenizer and do what you need to do:

>>> from transformers import AutoTokenizer, AutoModelForMaskedLM
>>> tokenizer = AutoTokenizer.from_pretrained("DmitryPogrebnoy/MedDistilBertBaseRuCased")
>>> model = AutoModelForMaskedLM.from_pretrained("DmitryPogrebnoy/MedDistilBertBaseRuCased")
Downloads last month
21
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.