Marcuswas's picture
Update README.md
ef32350 verified
|
raw
history blame
4.75 kB
metadata
license: apache-2.0
base_model: bert-base-uncased
tags:
  - generated_from_trainer
  - medical
  - biology
  - text-classification
  - multiclass classification
metrics:
  - accuracy
  - precision
  - recall
  - f1
model-index:
  - name: bert-drug-review-to-condition
    results: []
datasets:
  - Zakia/drugscom_reviews
language:
  - en

bert-drug-review-to-condition

This model is a fine-tuned version of bert-base-uncased on this dataset: Kallumadi,Surya and Grer,Felix. (2018). Drug Reviews (Drugs.com). UCI Machine Learning Repository. https://doi.org/10.24432/C5SK5S. It achieves the following results on the evaluation set:

  • Loss: 0.6678
  • Accuracy: 0.8376
  • Precision: 0.8325
  • Recall: 0.8376
  • F1: 0.8317

Model description

"bert-base-uncased" fine-tuned for text-classification (multiclass): from input text, the model outputs the most likely medical pathology of the person. Training based on predicting 'condition' feature from 'review' feature (i.e., the person reviews the drugs they are taking for their condition)

Intended uses & limitations

Personal project

Training and evaluation data

The 100 most frequent conditions of the dataset are selected: {0: 'multiple sclerosis', 1: 'overactive bladde', 2: 'hyperhidrosis', 3: 'ibromyalgia', 4: 'menstrual disorders', 5: 'hypogonadism, male', 6: 'rosacea', 7: 'muscle spasm', 8: 'high blood pressure', 9: 'epilepsy', 10: 'psoriatic arthritis', 11: 'post traumatic stress disorde', 12: 'smoking cessation', 13: 'not listed / othe', 14: 'herpes simplex', 15: 'opiate dependence', 16: 'social anxiety disorde', 17: 'urticaria', 18: 'allergic rhinitis', 19: 'polycystic ovary syndrome', 20: 'obsessive compulsive disorde', 21: 'depression', 22: 'migraine prevention', 23: 'neuropathic pain', 24: 'ankylosing spondylitis', 25: 'skin or soft tissue infection', 26: 'constipation, drug induced', 27: 'obesity', 28: 'vaginal yeast infection', 29: 'osteoarthritis', 30: 'restless legs syndrome', 31: 'plaque psoriasis', 32: 'panic disorde', 33: 'abnormal uterine bleeding', 34: 'adhd', 35: 'high cholesterol', 36: 'diabetes, type 2', 37: 'anxiety and stress', 38: 'asthma, maintenance', 39: 'pneumonia', 40: 'schizophrenia', 41: 'opiate withdrawal', 42: 'osteoporosis', 43: 'influenza', 44: 'weight loss', 45: 'cough and nasal congestion', 46: 'birth control', 47: 'benign prostatic hyperplasia', 48: 'helicobacter pylori infection', 49: 'anxiety', 50: 'bronchitis', 51: 'rheumatoid arthritis', 52: 'narcolepsy', 53: 'generalized anxiety disorde', 54: 'insomnia', 55: 'nasal congestion', 56: 'major depressive disorde', 57: 'schizoaffective disorde', 58: 'psoriasis', 59: 'premenstrual dysphoric disorde', 60: 'bacterial vaginitis', 61: 'motion sickness', 62: 'erectile dysfunction', 63: 'constipation, chronic', 64: 'copd, maintenance', 65: 'back pain', 66: 'alcohol dependence', 67: 'migraine', 68: 'bladder infection', 69: 'underactive thyroid', 70: 'ulcerative colitis', 71: 'chronic pain', 72: 'hiv infection', 73: 'cold sores', 74: 'breast cance', 75: 'bipolar disorde', 76: 'irritable bowel syndrome', 77: 'anesthesia', 78: 'onychomycosis, toenail', 79: 'chlamydia infection', 80: 'gerd', 81: 'endometriosis', 82: 'seizures', 83: 'alcohol withdrawal', 84: 'bowel preparation', 85: 'hot flashes', 86: 'bacterial infection', 87: 'inflammatory conditions', 88: 'constipation', 89: 'headache', 90: 'urinary tract infection', 91: 'sinusitis', 92: 'emergency contraception', 93: 'cough', 94: 'acne', 95: 'atrial fibrillation', 96: 'pain', 97: 'nausea/vomiting', 98: 'hepatitis c', 99: 'postmenopausal symptoms'} The 'review' feature is lowercased and are only selected examples with more than 16 characters.

Training procedure

See code available at: https://github.com/mlafuentem/Marcuswas-bert-drug-review-to-condition/blob/main/Exercise_classification_conditions_code.ipynb

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
0.8469 1.0 13390 0.8275 0.7673 0.7686 0.7673 0.7551
0.6319 2.0 26780 0.6895 0.8094 0.8090 0.8094 0.7978
0.4116 3.0 40170 0.6678 0.8376 0.8325 0.8376 0.8317

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1