nhull
/

random-forest-model

Text Classification

sentiment-analysis

Model card Files Files and versions Community

random-forest-model / README.md

nhull's picture

Update README.md

c5d16fa verified 18 days ago

|

history blame contribute delete

3.38 kB

	---
	license: mit
	datasets:
	- nhull/tripadvisor-split-dataset-v2
	language:
	- en
	pipeline_tag: text-classification
	tags:
	- sentiment-analysis
	- random-forest
	- text-classification
	- hotel-reviews
	- tripadvisor
	- nlp
	---

	# Random Forest Sentiment Analysis Model

	This model is a Random Forest classifier trained on the TripAdvisor sentiment analysis dataset. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.

	## Model Details

	- Model Type: Random Forest
	- Task: Sentiment Analysis
	- Input: A hotel review (text)
	- Output: Sentiment rating (1-5 stars)
	- Dataset Used: TripAdvisor sentiment dataset (balanced labels)

	## Intended Use

	This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.

	## How to Use the Model

	1. Install the required dependencies:
	```bash
	pip install joblib
	```

	2. Download and load the model:
	You can download the model from Hugging Face and use it to predict sentiment.

	Example code to download and use the model:
	```python
	from huggingface_hub import hf_hub_download
	import joblib

	# Download model from Hugging Face
	model_path = hf_hub_download(repo_id="your-username/random-forest-model", filename="random_forest_model.joblib")

	# Load the model
	model = joblib.load(model_path)

	# Predict sentiment of a review
	def predict_sentiment(review):
	return model.predict([review])[0]

	review = "This hotel was fantastic. The service was great and the room was clean."
	print(f"Predicted sentiment: {predict_sentiment(review)}")
	```

	3. The model will return a sentiment rating between 1 and 5 stars, where:
	- 1: Very bad
	- 2: Bad
	- 3: Neutral
	- 4: Good
	- 5: Very good

	## Model Evaluation

	- Test Accuracy: 55.28% on the test set.

	- Classification Report (Test Set):

	\| Label \| Precision \| Recall \| F1-score \| Support \|
	\|-------\|-----------\|--------\|----------\|---------\|
	\| 1.0 \| 0.62 \| 0.78 \| 0.69 \| 1600 \|
	\| 2.0 \| 0.48 \| 0.38 \| 0.42 \| 1600 \|
	\| 3.0 \| 0.49 \| 0.40 \| 0.44 \| 1600 \|
	\| 4.0 \| 0.49 \| 0.46 \| 0.48 \| 1600 \|
	\| 5.0 \| 0.63 \| 0.74 \| 0.68 \| 1600 \|
	\| Accuracy \| - \| - \| 0.55 \| 8000 \|
	\| Macro avg \| 0.54 \| 0.55 \| 0.54 \| 8000 \|
	\| Weighted avg \| 0.54 \| 0.55 \| 0.54 \| 8000 \|

	### Cross-validation Scores:

	\| Metric \| Value \|
	\|-------------------------------------\|--------------------------------------------\|
	\| Random Forest Cross-validation scores \| [0.54983553, 0.55164474, 0.55805921, 0.55657895, 0.54424342] \|
	\| Random Forest Mean Cross-validation score \| 0.5521 \|

	## Limitations

	- The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
	- The model was trained on the TripAdvisor dataset and may not generalize well to reviews from other sources or domains.
	- The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.