Zakia's picture
Modified widget for examples, including added example_title
bc989d1
|
raw
history blame
7.19 kB
---
license: apache-2.0
datasets:
- Zakia/drugscom_reviews
language:
- en
metrics:
- accuracy
library_name: transformers
pipeline_tag: text-classification
tags:
- health
- medicine
- patient reviews
- drug reviews
- depression
- text classification
widget:
- text: "This medication has changed my life for the better. I've experienced no side effects and my symptoms of depression have significantly decreased."
example_title: "Example 1"
- text: "I've had a terrible experience with this medication. It made me feel nauseous and I didn't notice any improvement in my condition."
example_title: "Example 2"
---
# Model Card for Zakia/distilbert-drugscom_depression_reviews
This model is a DistilBERT-based classifier fine-tuned on drug reviews for the depression medical condition from Drugs.com.
The dataset used for fine-tuning is the [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) dataset, which is filtered for the condition 'Depression'.
The base model for fine-tuning was the [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased).
## Model Details
### Model Description
- Developed by: Zakia
- Model type: Text Classification
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: distilbert-base-uncased
## Uses
### Direct Use
This model is intended to classify drug reviews into high or low quality, aiding in the analysis of patient feedback on depression medications.
### Out-of-Scope Use
This model is not designed to diagnose or treat depression or to replace professional medical advice.
## Bias, Risks, and Limitations
The model may inherit biases present in the dataset and should not be used as the sole decision-maker for healthcare or treatment options.
### Recommendations
Use the model as a tool to support, not replace, professional judgment.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch.nn.functional as F
model_name = "Zakia/distilbert-drugscom_depression_reviews"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Define a function to print predictions with labels
def print_predictions(review_text, model, tokenizer):
inputs = tokenizer(review_text, return_tensors="pt")
outputs = model(**inputs)
predictions = F.softmax(outputs.logits, dim=-1)
# LABEL_0 is for low quality and LABEL_1 for high quality
print(f"Review: \"{review_text}\"")
print(f"Prediction: {{'LABEL_0 (Low quality)': {predictions[0][0].item():.4f}, 'LABEL_1 (High quality)': {predictions[0][1].item():.4f}}}\n")
# High quality review example
high_quality_review = "This medication has changed my life for the better. I've experienced no side effects and my symptoms of depression have significantly decreased."
print_predictions(high_quality_review, model, tokenizer)
# Low quality review example
low_quality_review = "I've had a terrible experience with this medication. It made me feel nauseous and I didn't notice any improvement in my condition."
print_predictions(low_quality_review, model, tokenizer)
```
## Training Details
### Training Data
The model was fine-tuned on a dataset of drug reviews specifically related to depression, filtered from Drugs.com.
This dataset is accessible from [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) on Hugging Face datasets (condition = 'Depression') for 'train'.
Number of records in train dataset: 9069 rows.
### Training Procedure
#### Preprocessing
The reviews were cleaned and preprocessed to remove quotes, HTML tags and decode HTML entities.
A new column called 'high_quality_review' was also added to the reviews.
'high_quality_review' was computed as 1 if rating > 5 (positive rating) and usefulCount > the 75th percentile of usefulCount (65) or 0, otherwise.
Train dataset high_quality_review counts: Counter({0: 6949, 1: 2120})
Then:
This training data was balanced by downsampling low quality reviews (high_quality_review = 0).
The final training data had 4240 rows of reviews:
Train dataset high_quality_review counts: Counter({0: 2120, 1: 2120})
#### Training Hyperparameters
- Learning Rate: 3e-5
- Batch Size: 16
- Epochs: 1
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
The model was tested on a dataset of drug reviews specifically related to depression, filtered from Drugs.com.
This dataset is accessible from [Zakia/drugscom_reviews](https://huggingface.co/datasets/Zakia/drugscom_reviews) on Hugging Face datasets (condition = 'Depression') for 'test'.
Number of records in test dataset: 3095 rows.
#### Preprocessing
The reviews were cleaned and preprocessed to remove quotes, HTML tags and decode HTML entities.
A new column called 'high_quality_review' was also added to the reviews.
'high_quality_review' was computed as 1 if rating > 5 (positive rating) and usefulCount > the 75th percentile of usefulCount (65) or 0, otherwise.
Note: the 75th percentile of usefulCount is based on the train dataset.
Test dataset high_quality_review counts: Counter({0: 2365, 1: 730})
#### Metrics
The model's performance was evaluated based on accuracy.
### Results
The fine-tuning process yielded the following results:
| Epoch | Training Loss | Validation Loss | Accuracy |
|-------|---------------|-----------------|----------|
| 1 | 0.38 | 0.80 | 0.77 |
The model demonstrates its capability to classify drug reviews as high or low quality with an accuracy of 77%.
Low Quality: high_quality_review=0
High Quality: high_quality_review=1
## Technical Specifications
### Model Architecture and Objective
DistilBERT model architecture was used, with a binary classification head for high and low quality review classification.
### Compute Infrastructure
The model was trained using a T4 GPU on Google Colab.
#### Hardware
T4 GPU via Google Colab.
## Citation
If you use this model, please cite the original DistilBERT paper:
**BibTeX:**
```bibtex
@article{sanh2019distilbert,
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
journal={arXiv preprint arXiv:1910.01108},
year={2019}
}
```
**APA:**
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
## Glossary
- Low Quality Review: high_quality_review=0
- High Quality Review: high_quality_review=1
## More Information
For further queries or issues with the model, please use the [discussions section on this model's Hugging Face page](https://huggingface.co/Zakia/distilbert-drugscom_depression_reviews/discussions).
## Model Card Authors
- Zakia
## Model Card Contact
For more information or inquiries regarding this model, please use the [discussions section on this model's Hugging Face page](https://huggingface.co/Zakia/distilbert-drugscom_depression_reviews/discussions).