--- license: mit datasets: - Aditya1010/17k-hotel-reviews-dataset metrics: - accuracy base_model: - distilbert/distilbert-base-uncased pipeline_tag: text-classification library_name: transformers tags: - Sentiment Analysis - DistilBERT - Text Classification - Hotel Reviews --- # Hotel Review Classifier This model is a sentiment classification model for hotel reviews, trained to predict whether a review is **positive** or **negative**. The model was fine-tuned using the `distilbert-base-uncased` model architecture, based on the [DistilBERT model](https://huggingface.co/distilbert/distilbert-base-uncased) from Hugging Face, and trained on the [17k Hotel Reviews Dataset](https://huggingface.co/datasets/Aditya1010/17k-hotel-reviews-dataset). ## Model Details - **Model Type**: DistilBERT-based model for sequence classification - **Model Architecture**: `distilbert-base-uncased` - **Number of Parameters**: Approximately 66M parameters - **Training Dataset**: The model was trained on the `17k-hotel-reviews-dataset`, which contains 17,000 hotel reviews with labels for sentiment (positive/negative). - **Fine-Tuning Task**: Sentiment analysis for hotel reviews (positive or negative sentiment) ## Training Data - **Dataset**: [17k Hotel Reviews Dataset](https://huggingface.co/datasets/Aditya1010/17k-hotel-reviews-dataset) - **Data Description**: The dataset consists of 17,000 hotel reviews, each labeled with a sentiment (positive/negative). - **Preprocessing**: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs. ## Training Details - **Training Framework**: Hugging Face Transformers and PyTorch - **Learning Rate**: 2e-5 - **Epochs**: 3 - **Batch Size**: 16 - **Optimizer**: AdamW - **Training Time**: Approximately 2 hours on a GPU ## Usage To use the model for inference, you can use the following code: ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch # Load the fine-tuned model and tokenizer model = AutoModelForSequenceClassification.from_pretrained("kmack/HotelReviewClassifier") tokenizer = AutoTokenizer.from_pretrained("kmack/HotelReviewClassifier") # Example review for prediction review = "This is the best hotel I've ever stayed in!" # Tokenize the input text inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True) # Get predictions with torch.no_grad(): outputs = model(**inputs) # Get the predicted label (0 for negative, 1 for positive) prediction = torch.argmax(outputs.logits, dim=-1) print(f"Predicted sentiment: {'Positive' if prediction == 1 else 'Negative'}") ``` ## Citation If you use this model in your research, please cite the following: ```@misc{hotel_review_classifier, author = {Kmack}, title = {Hotel Review Classifier}, year = {2024}, url = {https://huggingface.co/kmack/HotelReviewClassifier} } ```