---
license: mit
datasets:
- nhull/tripadvisor-split-dataset-v2
language:
- en
pipeline_tag: text-classification
tags:
- sentiment-analysis
- random-forest
- text-classification
- hotel-reviews
- tripadvisor
- nlp
---

# Random Forest Sentiment Analysis Model

This model is a **Random Forest** classifier trained on the **TripAdvisor sentiment analysis dataset**. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.

## Model Details

- **Model Type**: Random Forest
- **Task**: Sentiment Analysis
- **Input**: A hotel review (text)
- **Output**: Sentiment rating (1-5 stars)
- **Dataset Used**: TripAdvisor sentiment dataset (balanced labels)

## Intended Use

This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.

## How to Use the Model

1. **Install the required dependencies**:
    ```bash
    pip install joblib
    ```

2. **Download and load the model**:
    You can download the model from Hugging Face and use it to predict sentiment.

    Example code to download and use the model:
    ```python
    from huggingface_hub import hf_hub_download
    import joblib

    # Download model from Hugging Face
    model_path = hf_hub_download(repo_id="your-username/random-forest-model", filename="random_forest_model.joblib")

    # Load the model
    model = joblib.load(model_path)

    # Predict sentiment of a review
    def predict_sentiment(review):
        return model.predict([review])[0]

    review = "This hotel was fantastic. The service was great and the room was clean."
    print(f"Predicted sentiment: {predict_sentiment(review)}")
    ```

3. **The model will return a sentiment rating** between 1 and 5 stars, where:
   - 1: Very bad
   - 2: Bad
   - 3: Neutral
   - 4: Good
   - 5: Very good

## Model Evaluation

- **Test Accuracy**: 55.28% on the test set.
  
- **Classification Report** (Test Set):

| Label | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 1.0   | 0.62      | 0.78   | 0.69     | 1600    |
| 2.0   | 0.48      | 0.38   | 0.42     | 1600    |
| 3.0   | 0.49      | 0.40   | 0.44     | 1600    |
| 4.0   | 0.49      | 0.46   | 0.48     | 1600    |
| 5.0   | 0.63      | 0.74   | 0.68     | 1600    |
| **Accuracy** | -   | -      | **0.55**  | 8000    |
| **Macro avg** | 0.54 | 0.55   | 0.54     | 8000    |
| **Weighted avg** | 0.54 | 0.55 | 0.54     | 8000    |

### Cross-validation Scores:

| Metric                              | Value                                      |
|-------------------------------------|--------------------------------------------|
| **Random Forest Cross-validation scores** | [0.54983553, 0.55164474, 0.55805921, 0.55657895, 0.54424342] |
| **Random Forest Mean Cross-validation score** | 0.5521                                     |

## Limitations

- The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
- The model was trained on the **TripAdvisor** dataset and may not generalize well to reviews from other sources or domains.
- The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.