lrei
/

rad-small

Text Classification

Inference Endpoints

Model card Files Files and versions Community

rad-small / README.md

lrei's picture

Update README.md

da24123 verified 4 months ago

|

history blame contribute delete

2.29 kB

	---
	license: cc0-1.0
	base_model:
	- distilbert/distilroberta-base
	pipeline_tag: text-classification
	library_name: transformers
	---

	This is a [distilroberta-base](distilbert/distilroberta-base) model fined tuned to classify text into 3 categories:

	- Rare Diseases
	- Non-Rare Diseases
	- Other

	The details of how this model was built and evaluated are provided in the article:

	Rei L, Pita Costa J, Zdolšek Draksler T. Automatic Classification and Visualization of Text Data on Rare Diseases. _Journal of Personalized Medicine_. 2024; 14(5):545. https://doi.org/10.3390/jpm14050545

	```
	@Article{jpm14050545,
	AUTHOR = {Rei, Luis and Pita Costa, Joao and Zdolšek Draksler, Tanja},
	TITLE = {Automatic Classification and Visualization of Text Data on Rare Diseases},
	JOURNAL = {Journal of Personalized Medicine},
	VOLUME = {14},
	YEAR = {2024},
	NUMBER = {5},
	ARTICLE-NUMBER = {545},
	URL = {https://www.mdpi.com/2075-4426/14/5/545},
	PubMedID = {38793127},
	ISSN = {2075-4426},
	DOI = {10.3390/jpm14050545}
	}
	```
	Note that the in the article the larger roberta-base model is fine-tuned instead. This is a smaller model. This model is shared for demonstration and validation purposes. Hyper-parameters were not tuned.

	## Using this model
	Simplest way to use this model is via a huggingface transformers' pipeline.

	```python
	# Use a pipeline as a high-level helper
	from transformers import pipeline

	pipe = pipeline("text-classification", model="lrei/rad-small")

	# Simple high-level usage
	pipe(["The patient suffer from a complex genetic disorder.", "The patient suffers from a common genetic disorder."])
	```

	## Dataset

	The dataset used to train this model is available on [zenodo](https://zenodo.org/records/13882003).
	It is a subset of abstracts obtained from PubMed and sorted into the 3 classes on the basis of their MeSH terms.

	Like the model, the dataset is provided for demonstration and methodology validation purposes. The original PubMed data was randomly under-sampled.

	## Code
	The code used to create this model is available on [Github](https://github.com/lrei/rad).

	## Test Results

	Averaged over all 3 classes:

	\| average \| precision \| recall \| F1 \|
	\| ------- \| --------- \| ------ \| ---- \|
	\| micro \| 0.84 \| 0.84 \| 0.84 \|
	\| macro \| 0.84 \| 0.84 \| 0.84 \|