Random Forest Sentiment Analysis Model
This model is a Random Forest classifier trained on the TripAdvisor sentiment analysis dataset. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.
Model Details
- Model Type: Random Forest
- Task: Sentiment Analysis
- Input: A hotel review (text)
- Output: Sentiment rating (1-5 stars)
- Dataset Used: TripAdvisor sentiment dataset (balanced labels)
Intended Use
This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
How to Use the Model
Install the required dependencies:
pip install joblib
Download and load the model: You can download the model from Hugging Face and use it to predict sentiment.
Example code to download and use the model:
from huggingface_hub import hf_hub_download import joblib # Download model from Hugging Face model_path = hf_hub_download(repo_id="your-username/random-forest-model", filename="random_forest_model.joblib") # Load the model model = joblib.load(model_path) # Predict sentiment of a review def predict_sentiment(review): return model.predict([review])[0] review = "This hotel was fantastic. The service was great and the room was clean." print(f"Predicted sentiment: {predict_sentiment(review)}")
The model will return a sentiment rating between 1 and 5 stars, where:
- 1: Very bad
- 2: Bad
- 3: Neutral
- 4: Good
- 5: Very good
Model Evaluation
Test Accuracy: 55.28% on the test set.
Classification Report (Test Set):
Label | Precision | Recall | F1-score | Support |
---|---|---|---|---|
1.0 | 0.62 | 0.78 | 0.69 | 1600 |
2.0 | 0.48 | 0.38 | 0.42 | 1600 |
3.0 | 0.49 | 0.40 | 0.44 | 1600 |
4.0 | 0.49 | 0.46 | 0.48 | 1600 |
5.0 | 0.63 | 0.74 | 0.68 | 1600 |
Accuracy | - | - | 0.55 | 8000 |
Macro avg | 0.54 | 0.55 | 0.54 | 8000 |
Weighted avg | 0.54 | 0.55 | 0.54 | 8000 |
Cross-validation Scores:
Metric | Value |
---|---|
Random Forest Cross-validation scores | [0.54983553, 0.55164474, 0.55805921, 0.55657895, 0.54424342] |
Random Forest Mean Cross-validation score | 0.5521 |
Limitations
- The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
- The model was trained on the TripAdvisor dataset and may not generalize well to reviews from other sources or domains.
- The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.