|
--- |
|
license: mit |
|
datasets: |
|
- nhull/tripadvisor-split-dataset-v2 |
|
language: |
|
- en |
|
pipeline_tag: text-classification |
|
tags: |
|
- sentiment-analysis |
|
- random-forest |
|
- text-classification |
|
- hotel-reviews |
|
- tripadvisor |
|
- nlp |
|
--- |
|
|
|
# Random Forest Sentiment Analysis Model |
|
|
|
This model is a **Random Forest** classifier trained on the **TripAdvisor sentiment analysis dataset**. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars. |
|
|
|
## Model Details |
|
|
|
- **Model Type**: Random Forest |
|
- **Task**: Sentiment Analysis |
|
- **Input**: A hotel review (text) |
|
- **Output**: Sentiment rating (1-5 stars) |
|
- **Dataset Used**: TripAdvisor sentiment dataset (balanced labels) |
|
|
|
## Intended Use |
|
|
|
This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review. |
|
|
|
## How to Use the Model |
|
|
|
1. **Install the required dependencies**: |
|
```bash |
|
pip install joblib |
|
``` |
|
|
|
2. **Download and load the model**: |
|
You can download the model from Hugging Face and use it to predict sentiment. |
|
|
|
Example code to download and use the model: |
|
```python |
|
from huggingface_hub import hf_hub_download |
|
import joblib |
|
|
|
# Download model from Hugging Face |
|
model_path = hf_hub_download(repo_id="your-username/random-forest-model", filename="random_forest_model.joblib") |
|
|
|
# Load the model |
|
model = joblib.load(model_path) |
|
|
|
# Predict sentiment of a review |
|
def predict_sentiment(review): |
|
return model.predict([review])[0] |
|
|
|
review = "This hotel was fantastic. The service was great and the room was clean." |
|
print(f"Predicted sentiment: {predict_sentiment(review)}") |
|
``` |
|
|
|
3. **The model will return a sentiment rating** between 1 and 5 stars, where: |
|
- 1: Very bad |
|
- 2: Bad |
|
- 3: Neutral |
|
- 4: Good |
|
- 5: Very good |
|
|
|
## Model Evaluation |
|
|
|
- **Test Accuracy**: 55.28% on the test set. |
|
|
|
- **Classification Report** (Test Set): |
|
|
|
| Label | Precision | Recall | F1-score | Support | |
|
|-------|-----------|--------|----------|---------| |
|
| 1.0 | 0.62 | 0.78 | 0.69 | 1600 | |
|
| 2.0 | 0.48 | 0.38 | 0.42 | 1600 | |
|
| 3.0 | 0.49 | 0.40 | 0.44 | 1600 | |
|
| 4.0 | 0.49 | 0.46 | 0.48 | 1600 | |
|
| 5.0 | 0.63 | 0.74 | 0.68 | 1600 | |
|
| **Accuracy** | - | - | **0.55** | 8000 | |
|
| **Macro avg** | 0.54 | 0.55 | 0.54 | 8000 | |
|
| **Weighted avg** | 0.54 | 0.55 | 0.54 | 8000 | |
|
|
|
### Cross-validation Scores: |
|
|
|
| Metric | Value | |
|
|-------------------------------------|--------------------------------------------| |
|
| **Random Forest Cross-validation scores** | [0.54983553, 0.55164474, 0.55805921, 0.55657895, 0.54424342] | |
|
| **Random Forest Mean Cross-validation score** | 0.5521 | |
|
|
|
## Limitations |
|
|
|
- The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars). |
|
- The model was trained on the **TripAdvisor** dataset and may not generalize well to reviews from other sources or domains. |
|
- The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions. |
|
|