Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,49 @@
|
|
1 |
-
---
|
2 |
-
license:
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc0-1.0
|
3 |
+
---
|
4 |
+
|
5 |
+
This is a [distilroberta-base](distilbert/distilroberta-base) model fined tuned to classify text into 3 categories:
|
6 |
+
|
7 |
+
- Rare Diseases
|
8 |
+
- Non-Rare Diseases
|
9 |
+
- Other
|
10 |
+
|
11 |
+
The details of how this model was built and evaluated are provided in the article:
|
12 |
+
|
13 |
+
Rei L, Pita Costa J, Zdolšek Draksler T. Automatic Classification and Visualization of Text Data on Rare Diseases. _Journal of Personalized Medicine_. 2024; 14(5):545. https://doi.org/10.3390/jpm14050545
|
14 |
+
|
15 |
+
```
|
16 |
+
@Article{jpm14050545,
|
17 |
+
AUTHOR = {Rei, Luis and Pita Costa, Joao and Zdolšek Draksler, Tanja},
|
18 |
+
TITLE = {Automatic Classification and Visualization of Text Data on Rare Diseases},
|
19 |
+
JOURNAL = {Journal of Personalized Medicine},
|
20 |
+
VOLUME = {14},
|
21 |
+
YEAR = {2024},
|
22 |
+
NUMBER = {5},
|
23 |
+
ARTICLE-NUMBER = {545},
|
24 |
+
URL = {https://www.mdpi.com/2075-4426/14/5/545},
|
25 |
+
PubMedID = {38793127},
|
26 |
+
ISSN = {2075-4426},
|
27 |
+
DOI = {10.3390/jpm14050545}
|
28 |
+
}
|
29 |
+
```
|
30 |
+
Note that the in the article the larger roberta-base model is fine-tuned instead. This is a smaller model. This model is shared for demonstration and validation purposes. Hyper-parameters were not tuned.
|
31 |
+
|
32 |
+
## Dataset
|
33 |
+
|
34 |
+
The dataset used to train this model is available on [zenodo](https://zenodo.org/records/13882003).
|
35 |
+
It is a subset of abstracts obtained from PubMed and sorted into the 3 classes on the basis of their MeSH terms.
|
36 |
+
|
37 |
+
Like the model, the dataset is provided for demonstration and methodology validation purposes. The original PubMed data was randomly under-sampled.
|
38 |
+
|
39 |
+
## Code
|
40 |
+
The code used to create this model is available on [Github](https://github.com/lrei/rad).
|
41 |
+
|
42 |
+
## Test Results
|
43 |
+
|
44 |
+
Averaged over all 3 classes:
|
45 |
+
|
46 |
+
| average | precision | recall | F1 |
|
47 |
+
| ------- | --------- | ------ | ---- |
|
48 |
+
| micro | 0.84 | 0.84 | 0.84 |
|
49 |
+
| macro | 0.84 | 0.84 | 0.84 |
|