--- license: cc0-1.0 base_model: - distilbert/distilroberta-base pipeline_tag: text-classification library_name: transformers --- This is a [distilroberta-base](distilbert/distilroberta-base) model fined tuned to classify text into 3 categories: - Rare Diseases - Non-Rare Diseases - Other The details of how this model was built and evaluated are provided in the article: Rei L, Pita Costa J, Zdolšek Draksler T. Automatic Classification and Visualization of Text Data on Rare Diseases. _Journal of Personalized Medicine_. 2024; 14(5):545. https://doi.org/10.3390/jpm14050545 ``` @Article{jpm14050545, AUTHOR = {Rei, Luis and Pita Costa, Joao and Zdolšek Draksler, Tanja}, TITLE = {Automatic Classification and Visualization of Text Data on Rare Diseases}, JOURNAL = {Journal of Personalized Medicine}, VOLUME = {14}, YEAR = {2024}, NUMBER = {5}, ARTICLE-NUMBER = {545}, URL = {https://www.mdpi.com/2075-4426/14/5/545}, PubMedID = {38793127}, ISSN = {2075-4426}, DOI = {10.3390/jpm14050545} } ``` Note that the in the article the larger roberta-base model is fine-tuned instead. This is a smaller model. This model is shared for demonstration and validation purposes. Hyper-parameters were not tuned. ## Using this model Simplest way to use this model is via a huggingface transformers' pipeline. ```python # Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="lrei/rad-small") # Simple high-level usage pipe(["The patient suffer from a complex genetic disorder.", "The patient suffers from a common genetic disorder."]) ``` ## Dataset The dataset used to train this model is available on [zenodo](https://zenodo.org/records/13882003). It is a subset of abstracts obtained from PubMed and sorted into the 3 classes on the basis of their MeSH terms. Like the model, the dataset is provided for demonstration and methodology validation purposes. The original PubMed data was randomly under-sampled. ## Code The code used to create this model is available on [Github](https://github.com/lrei/rad). ## Test Results Averaged over all 3 classes: | average | precision | recall | F1 | | ------- | --------- | ------ | ---- | | micro | 0.84 | 0.84 | 0.84 | | macro | 0.84 | 0.84 | 0.84 |