Zakia
/

distilbert-drugscom_depression_reviews

+---
+license: apache-2.0
+datasets:
+- Zakia/drugscom_reviews
+language:
+- en
+metrics:
+- accuracy
+library_name: transformers
+pipeline_tag: text-classification
+tags:
+- health
+- medicine
+- patient reviews
+- drug reviews
+- depression
+- text classification
+---
+# Model Card for Zakia/distilbert-drugscom_depression_reviews
+This model is a DistilBERT-based classifier fine-tuned on drug reviews for the depression medical condition from Drugs.com.
+The dataset used for fine-tuning is the [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) dataset, which is filtered for the condition 'Depression'.
+The base model for fine-tuning was the [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased).
+## Model Details
+### Model Description
+- **Developed by:*Zakia*
+- **Model type:*Text Classification*
+- **Language(s) (NLP):*English*
+- **License:*Apache 2.0*
+- **Finetuned from model:*distilbert-base-uncased*
+## Uses
+### Direct Use
+This model is intended to classify drug reviews into high or low quality, aiding in the analysis of patient feedback on depression medications.
+### Out-of-Scope Use
+This model is not designed to diagnose or treat depression or to replace professional medical advice.
+## Bias, Risks, and Limitations
+The model may inherit biases present in the dataset and should not be used as the sole decision-maker for healthcare or treatment options.
+### Recommendations
+Use the model as a tool to support, not replace, professional judgment.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch.nn.functional as F
+model_name = "Zakia/distilbert-drugscom_depression_reviews"
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Define a function to print predictions with labels
+def print_predictions(review_text, model, tokenizer):
+    inputs = tokenizer(review_text, return_tensors="pt")
+    outputs = model(**inputs)
+    predictions = F.softmax(outputs.logits, dim=-1)
+    # LABEL_0 is for low quality and LABEL_1 for high quality
+    print(f"Review: \"{review_text}\"")
+    print(f"Prediction: {{'LABEL_0 (Low quality)': {predictions[0][0].item():.4f}, 'LABEL_1 (High quality)': {predictions[0][1].item():.4f}}}\n")
+# High quality review example
+high_quality_review = "This medication has changed my life for the better. I've experienced no side effects and my symptoms of depression have significantly decreased."
+print_predictions(high_quality_review, model, tokenizer)
+# Low quality review example
+low_quality_review = "I've had a terrible experience with this medication. It made me feel nauseous and I didn't notice any improvement in my condition."
+print_predictions(low_quality_review, model, tokenizer)
+```
+## Training Details
+### Training Data
+The model was fine-tuned on a dataset of drug reviews specifically related to depression, filtered from Drugs.com.
+This dataset is accessible from [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) on Hugging Face datasets (condition = 'Depression') for 'train'.
+Number of records in train dataset: 9069 rows.
+### Training Procedure
+#### Preprocessing
+The reviews were cleaned and preprocessed to remove quotes, HTML tags and decode HTML entities.
+A new column called 'high_quality_review' was also added to the reviews.
+'high_quality_review' was computed as 1 if rating > 5 (positive rating) and usefulCount > the 75th percentile of usefulCount (65) or 0, otherwise.
+Train dataset high_quality_review counts: Counter({0: 6949, 1: 2120})
+Then:
+This training data was balanced by downsampling low quality reviews (high_quality_review = 0).
+The final training data had 4240 rows of reviews:
+Train dataset high_quality_review counts: Counter({0: 2120, 1: 2120})
+#### Training Hyperparameters
+- **Learning Rate: *3e-5*
+- **Batch Size:*16*
+- **Epochs:*1*
+## Evaluation
+### Testing Data, Factors & Metrics
+#### Testing Data
+The model was tested on a dataset of drug reviews specifically related to depression, filtered from Drugs.com.
+This dataset is accessible from [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) on Hugging Face datasets (condition = 'Depression') for 'test'.
+Number of records in test dataset: 3095 rows.
+#### Preprocessing
+The reviews were cleaned and preprocessed to remove quotes, HTML tags and decode HTML entities.
+A new column called 'high_quality_review' was also added to the reviews.
+'high_quality_review' was computed as 1 if rating > 5 (positive rating) and usefulCount > the 75th percentile of usefulCount (65) or 0, otherwise.
+Note: the 75th percentile of usefulCount is based on the train dataset.
+Test dataset high_quality_review counts: Counter({0: 2365, 1: 730})
+#### Metrics
+The model's performance was evaluated based on accuracy.
+### Results
+The fine-tuning process yielded the following results:
+| Epoch | Training Loss | Validation Loss | Accuracy |
+|-------|---------------|-----------------|----------|
+| 1     | 0.38          | 0.80            | 0.77     |
+The model demonstrates its capability to classify drug reviews as high or low quality with an accuracy of 77%.
+Low Quality: high_quality_review=0
+High Quality: high_quality_review=1
+## Technical Specifications
+### Model Architecture and Objective
+DistilBERT model architecture was used, with a binary classification head for high and low quality review classification.
+### Compute Infrastructure
+The model was trained using a T4 GPU on Google Colab.
+#### Hardware
+T4 GPU via Google Colab.
+## Citation
+If you use this model, please cite the original DistilBERT paper:
+**BibTeX:**
+```bibtex
+@article{sanh2019distilbert,
+  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
+  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
+  journal={arXiv preprint arXiv:1910.01108},
+  year={2019}
+}
+```
+**APA:**
+Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
+## Glossary
+- **Low Quality Review: *high_quality_review=0*
+- **High Quality Review:*high_quality_review=1*
+## More Information
+For further queries or issues with the model, please use the [discussions section on this model's Hugging Face page](https://huggingface.co/Zakia/distilbert-drugscom_depression_reviews/discussions).
+## Model Card Authors
+- Zakia
+## Model Card Contact
+For more information or inquiries regarding this model, please use the [discussions section on this model's Hugging Face page](https://huggingface.co/Zakia/distilbert-drugscom_depression_reviews/discussions).