|
--- |
|
license: mit |
|
datasets: |
|
- Aditya1010/17k-hotel-reviews-dataset |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- distilbert/distilbert-base-uncased |
|
pipeline_tag: text-classification |
|
library_name: transformers |
|
tags: |
|
- Sentiment Analysis |
|
- DistilBERT |
|
- Text Classification |
|
- Hotel Reviews |
|
--- |
|
# Hotel Review Classifier |
|
|
|
This model is a sentiment classification model for hotel reviews, trained to predict whether a review is **positive** or **negative**. The model was fine-tuned using the `distilbert-base-uncased` model architecture, based on the [DistilBERT model](https://huggingface.co/distilbert/distilbert-base-uncased) from Hugging Face, and trained on the [17k Hotel Reviews Dataset](https://huggingface.co/datasets/Aditya1010/17k-hotel-reviews-dataset). |
|
|
|
## Model Details |
|
- **Model Type**: DistilBERT-based model for sequence classification |
|
- **Model Architecture**: `distilbert-base-uncased` |
|
- **Number of Parameters**: Approximately 66M parameters |
|
- **Training Dataset**: The model was trained on the `17k-hotel-reviews-dataset`, which contains 17,000 hotel reviews with labels for sentiment (positive/negative). |
|
- **Fine-Tuning Task**: Sentiment analysis for hotel reviews (positive or negative sentiment) |
|
|
|
## Training Data |
|
- **Dataset**: [17k Hotel Reviews Dataset](https://huggingface.co/datasets/Aditya1010/17k-hotel-reviews-dataset) |
|
- **Data Description**: The dataset consists of 17,000 hotel reviews, each labeled with a sentiment (positive/negative). |
|
- **Preprocessing**: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs. |
|
|
|
## Training Details |
|
- **Training Framework**: Hugging Face Transformers and PyTorch |
|
- **Learning Rate**: 2e-5 |
|
- **Epochs**: 3 |
|
- **Batch Size**: 16 |
|
- **Optimizer**: AdamW |
|
- **Training Time**: Approximately 2 hours on a GPU |
|
|
|
## Usage |
|
To use the model for inference, you can use the following code: |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
import torch |
|
|
|
# Load the fine-tuned model and tokenizer |
|
model = AutoModelForSequenceClassification.from_pretrained("kmack/HotelReviewClassifier") |
|
tokenizer = AutoTokenizer.from_pretrained("kmack/HotelReviewClassifier") |
|
|
|
# Example review for prediction |
|
review = "This is the best hotel I've ever stayed in!" |
|
|
|
# Tokenize the input text |
|
inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True) |
|
|
|
# Get predictions |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
|
|
# Get the predicted label (0 for negative, 1 for positive) |
|
prediction = torch.argmax(outputs.logits, dim=-1) |
|
print(f"Predicted sentiment: {'Positive' if prediction == 1 else 'Negative'}") |
|
``` |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite the following: |
|
|
|
```@misc{hotel_review_classifier, |
|
author = {Kmack}, |
|
title = {Hotel Review Classifier}, |
|
year = {2024}, |
|
url = {https://huggingface.co/kmack/HotelReviewClassifier} |
|
} |
|
``` |
|
|