Random Forest Sentiment Analysis Model

This model is a Random Forest classifier trained on the TripAdvisor sentiment analysis dataset. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.

Model Details

  • Model Type: Random Forest
  • Task: Sentiment Analysis
  • Input: A hotel review (text)
  • Output: Sentiment rating (1-5 stars)
  • Dataset Used: TripAdvisor sentiment dataset (balanced labels)

Intended Use

This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.

How to Use the Model

  1. Install the required dependencies:

    pip install joblib
    
  2. Download and load the model: You can download the model from Hugging Face and use it to predict sentiment.

    Example code to download and use the model:

    from huggingface_hub import hf_hub_download
    import joblib
    
    # Download model from Hugging Face
    model_path = hf_hub_download(repo_id="your-username/random-forest-model", filename="random_forest_model.joblib")
    
    # Load the model
    model = joblib.load(model_path)
    
    # Predict sentiment of a review
    def predict_sentiment(review):
        return model.predict([review])[0]
    
    review = "This hotel was fantastic. The service was great and the room was clean."
    print(f"Predicted sentiment: {predict_sentiment(review)}")
    
  3. The model will return a sentiment rating between 1 and 5 stars, where:

    • 1: Very bad
    • 2: Bad
    • 3: Neutral
    • 4: Good
    • 5: Very good

Model Evaluation

  • Test Accuracy: 55.28% on the test set.

  • Classification Report (Test Set):

Label Precision Recall F1-score Support
1.0 0.62 0.78 0.69 1600
2.0 0.48 0.38 0.42 1600
3.0 0.49 0.40 0.44 1600
4.0 0.49 0.46 0.48 1600
5.0 0.63 0.74 0.68 1600
Accuracy - - 0.55 8000
Macro avg 0.54 0.55 0.54 8000
Weighted avg 0.54 0.55 0.54 8000

Cross-validation Scores:

Metric Value
Random Forest Cross-validation scores [0.54983553, 0.55164474, 0.55805921, 0.55657895, 0.54424342]
Random Forest Mean Cross-validation score 0.5521

Limitations

  • The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
  • The model was trained on the TripAdvisor dataset and may not generalize well to reviews from other sources or domains.
  • The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train nhull/random-forest-model

Collection including nhull/random-forest-model