Marcuswas's picture
Update README.md
cabfde8 verified
|
raw
history blame
4.81 kB
---
license: apache-2.0
base_model: bert-base-uncased
tags:
- generated_from_trainer
- medical
- biology
- text-classification
- multiclass classification
metrics:
- accuracy
- precision
- recall
- f1
model-index:
- name: bert-drug-review-to-condition
results: []
datasets:
- Zakia/drugscom_reviews
language:
- en
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# bert-drug-review-to-condition
This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6678
- Accuracy: 0.8376
- Precision: 0.8325
- Recall: 0.8376
- F1: 0.8317
## Model description
"bert-base-uncased" fine-tuned for text-classification (multiclass): from input text, the model outputs the most likely medical pathology of the person. Training based on predicting 'condition' feature from 'review' feature (i.e., the person reviews the drugs they are taking for their condition)
## Intended uses & limitations
Personal project
## Training and evaluation data
The 100 most frequent conditions of the dataset are selected:
{0: 'multiple sclerosis', 1: 'overactive bladde', 2: 'hyperhidrosis', 3: 'ibromyalgia', 4: 'menstrual disorders', 5: 'hypogonadism, male', 6: 'rosacea', 7: 'muscle spasm', 8: 'high blood pressure', 9: 'epilepsy', 10: 'psoriatic arthritis', 11: 'post traumatic stress disorde', 12: 'smoking cessation', 13: 'not listed / othe', 14: 'herpes simplex', 15: 'opiate dependence', 16: 'social anxiety disorde', 17: 'urticaria', 18: 'allergic rhinitis', 19: 'polycystic ovary syndrome', 20: 'obsessive compulsive disorde', 21: 'depression', 22: 'migraine prevention', 23: 'neuropathic pain', 24: 'ankylosing spondylitis', 25: 'skin or soft tissue infection', 26: 'constipation, drug induced', 27: 'obesity', 28: 'vaginal yeast infection', 29: 'osteoarthritis', 30: 'restless legs syndrome', 31: 'plaque psoriasis', 32: 'panic disorde', 33: 'abnormal uterine bleeding', 34: 'adhd', 35: 'high cholesterol', 36: 'diabetes, type 2', 37: 'anxiety and stress', 38: 'asthma, maintenance', 39: 'pneumonia', 40: 'schizophrenia', 41: 'opiate withdrawal', 42: 'osteoporosis', 43: 'influenza', 44: 'weight loss', 45: 'cough and nasal congestion', 46: 'birth control', 47: 'benign prostatic hyperplasia', 48: 'helicobacter pylori infection', 49: 'anxiety', 50: 'bronchitis', 51: 'rheumatoid arthritis', 52: 'narcolepsy', 53: 'generalized anxiety disorde', 54: 'insomnia', 55: 'nasal congestion', 56: 'major depressive disorde', 57: 'schizoaffective disorde', 58: 'psoriasis', 59: 'premenstrual dysphoric disorde', 60: 'bacterial vaginitis', 61: 'motion sickness', 62: 'erectile dysfunction', 63: 'constipation, chronic', 64: 'copd, maintenance', 65: 'back pain', 66: 'alcohol dependence', 67: 'migraine', 68: 'bladder infection', 69: 'underactive thyroid', 70: 'ulcerative colitis', 71: 'chronic pain', 72: 'hiv infection', 73: 'cold sores', 74: 'breast cance', 75: 'bipolar disorde', 76: 'irritable bowel syndrome', 77: 'anesthesia', 78: 'onychomycosis, toenail', 79: 'chlamydia infection', 80: 'gerd', 81: 'endometriosis', 82: 'seizures', 83: 'alcohol withdrawal', 84: 'bowel preparation', 85: 'hot flashes', 86: 'bacterial infection', 87: 'inflammatory conditions', 88: 'constipation', 89: 'headache', 90: 'urinary tract infection', 91: 'sinusitis', 92: 'emergency contraception', 93: 'cough', 94: 'acne', 95: 'atrial fibrillation', 96: 'pain', 97: 'nausea/vomiting', 98: 'hepatitis c', 99: 'postmenopausal symptoms'}
The 'review' feature is lowercased and are only selected examples with more than 16 characters.
## Training procedure
See code available at: https://github.com/mlafuentem/Marcuswas-bert-drug-review-to-condition/blob/main/Exercise_classification_conditions_code.ipynb
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0
### Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
|:-------------:|:-----:|:-----:|:---------------:|:--------:|:---------:|:------:|:------:|
| 0.8469 | 1.0 | 13390 | 0.8275 | 0.7673 | 0.7686 | 0.7673 | 0.7551 |
| 0.6319 | 2.0 | 26780 | 0.6895 | 0.8094 | 0.8090 | 0.8094 | 0.7978 |
| 0.4116 | 3.0 | 40170 | 0.6678 | 0.8376 | 0.8325 | 0.8376 | 0.8317 |
### Framework versions
- Transformers 4.40.0
- Pytorch 2.2.1+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1