lrei
/

rad-small

+---
+license: cc0-1.0
+---
+This is a [distilroberta-base](distilbert/distilroberta-base) model fined tuned to classify text into 3 categories:
+- Rare Diseases
+- Non-Rare Diseases
+- Other
+The details of how this model was built and evaluated are provided in the article:
+Rei L, Pita Costa J, Zdolšek Draksler T. Automatic Classification and Visualization of Text Data on Rare Diseases. _Journal of Personalized Medicine_. 2024; 14(5):545. https://doi.org/10.3390/jpm14050545
+```
+@Article{jpm14050545,
+AUTHOR = {Rei, Luis and Pita Costa, Joao and Zdolšek Draksler, Tanja},
+TITLE = {Automatic Classification and Visualization of Text Data on Rare Diseases},
+JOURNAL = {Journal of Personalized Medicine},
+VOLUME = {14},
+YEAR = {2024},
+NUMBER = {5},
+ARTICLE-NUMBER = {545},
+URL = {https://www.mdpi.com/2075-4426/14/5/545},
+PubMedID = {38793127},
+ISSN = {2075-4426},
+DOI = {10.3390/jpm14050545}
+}
+```
+Note that the in the article the larger roberta-base model is fine-tuned instead. This is a smaller model. This model is shared for demonstration and validation purposes. Hyper-parameters were not tuned.
+## Dataset
+The dataset used to train this model is available on [zenodo](https://zenodo.org/records/13882003).
+It is a subset of abstracts obtained from PubMed and sorted into the 3 classes on the basis of their MeSH terms.
+Like the model, the dataset is provided for demonstration and methodology validation purposes. The original PubMed data was randomly under-sampled.
+## Code
+The code used to create this model is available on [Github](https://github.com/lrei/rad).
+## Test Results
+Averaged over all 3 classes:
+| average | precision | recall | F1   |
+| ------- | --------- | ------ | ---- |
+| micro   | 0.84      | 0.84   | 0.84 |
+| macro   | 0.84      | 0.84   | 0.84 |