kmack's picture
Create README.md
9a7af52 verified
metadata
license: mit
datasets:
  - Aditya1010/17k-hotel-reviews-dataset
metrics:
  - accuracy
base_model:
  - distilbert/distilbert-base-uncased
pipeline_tag: text-classification
library_name: transformers
tags:
  - Sentiment Analysis
  - DistilBERT
  - Text Classification
  - Hotel Reviews

Hotel Review Classifier

This model is a sentiment classification model for hotel reviews, trained to predict whether a review is positive or negative. The model was fine-tuned using the distilbert-base-uncased model architecture, based on the DistilBERT model from Hugging Face, and trained on the 17k Hotel Reviews Dataset.

Model Details

  • Model Type: DistilBERT-based model for sequence classification
  • Model Architecture: distilbert-base-uncased
  • Number of Parameters: Approximately 66M parameters
  • Training Dataset: The model was trained on the 17k-hotel-reviews-dataset, which contains 17,000 hotel reviews with labels for sentiment (positive/negative).
  • Fine-Tuning Task: Sentiment analysis for hotel reviews (positive or negative sentiment)

Training Data

  • Dataset: 17k Hotel Reviews Dataset
  • Data Description: The dataset consists of 17,000 hotel reviews, each labeled with a sentiment (positive/negative).
  • Preprocessing: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs.

Training Details

  • Training Framework: Hugging Face Transformers and PyTorch
  • Learning Rate: 2e-5
  • Epochs: 3
  • Batch Size: 16
  • Optimizer: AdamW
  • Training Time: Approximately 2 hours on a GPU

Usage

To use the model for inference, you can use the following code:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the fine-tuned model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("kmack/HotelReviewClassifier")
tokenizer = AutoTokenizer.from_pretrained("kmack/HotelReviewClassifier")

# Example review for prediction
review = "This is the best hotel I've ever stayed in!"

# Tokenize the input text
inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)

# Get the predicted label (0 for negative, 1 for positive)
prediction = torch.argmax(outputs.logits, dim=-1)
print(f"Predicted sentiment: {'Positive' if prediction == 1 else 'Negative'}")

Citation

If you use this model in your research, please cite the following:

  author = {Kmack},
  title = {Hotel Review Classifier},
  year = {2024},
  url = {https://huggingface.co/kmack/HotelReviewClassifier}
}