Hindi

Hindi Sentiment Analysis Model

This repository contains a Hindi sentiment analysis model that can classify text into three categories: negative (neg), neutral (neu), and positive (pos). The model has been trained and evaluated using various BERT-based architectures, with XLM-RoBERTa showing the best performance.

Model Performance

Test Accuracy Comparison

Test Accuracy Comparison

Our extensive evaluation shows:

  • XLM-RoBERTa: 81.3%
  • mBERT: 76.5%
  • Custom-BERT-Attention: 74.9%
  • IndicBERT: 69.9%

Detailed Results

Confusion Matrices

Confusion Matrices

The confusion matrices show the prediction performance for each model:

  • XLM-RoBERTa shows the strongest performance with 82.1% accuracy on positive class
  • mBERT demonstrates balanced performance across classes
  • Custom-BERT-Attention maintains consistent performance
  • IndicBERT shows room for improvement in negative class detection

Per-class Metrics

Per-class Metrics

The detailed per-class metrics show:

  1. Precision:

    • Positive class: Best performance across all models (~0.80-0.85)
    • Neutral class: Consistent performance (~0.75-0.80)
    • Negative class: More varied performance (~0.40-0.70)
  2. Recall:

    • Positive class: High recall across models (~0.85-0.90)
    • Neutral class: Moderate recall (~0.65-0.85)
    • Negative class: Lower but improving recall (~0.25-0.60)
  3. F1-Score:

    • Positive class: Best overall performance (~0.80-0.85)
    • Neutral class: Good balance (~0.70-0.80)
    • Negative class: Area for potential improvement (~0.30-0.65)

Training Progress

Training Progress

The training graphs show:

  • Consistent loss reduction across epochs
  • Stable validation accuracy improvement
  • No significant overfitting
  • XLM-RoBERTa achieving the best validation accuracy
  • Custom-BERT-Attention showing rapid initial learning

Model Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("madhav112/hindi-sentiment-analysis")
model = AutoModelForSequenceClassification.from_pretrained("madhav112/hindi-sentiment-analysis")

# Example usage
text = "यह फिल्म बहुत अच्छी है"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
predictions = outputs.logits.argmax(-1)

Model Architecture

The repository contains experiments with multiple BERT-based architectures:

  1. XLM-RoBERTa (Best performing)

    • Highest overall accuracy
    • Best performance on positive sentiment
    • Strong cross-lingual capabilities
  2. mBERT

    • Good balanced performance
    • Strong on neutral class detection
    • Consistent across all metrics
  3. Custom-BERT-Attention

    • Competitive performance
    • Quick convergence during training
    • Good precision on positive class
  4. IndicBERT

    • Baseline performance
    • Room for improvement
    • Better suited for specific Indian language tasks

Dataset

The model was trained on a Hindi sentiment analysis dataset with three classes:

  • Positive (pos)
  • Neutral (neu)
  • Negative (neg)

The confusion matrices show balanced class distribution and strong performance across categories.

Training Details

The model was trained for 7 epochs with the following characteristics:

  • Learning rate: Optimized for each architecture
  • Batch size: Adjusted for optimal performance
  • Validation split: Regular evaluation during training
  • Early stopping: Monitored for best model selection
  • Loss function: Cross-entropy loss

Limitations

  • Lower performance on negative sentiment detection compared to positive
  • Neutral class classification shows moderate confusion with both positive and negative
  • Performance may vary on domain-specific text
  • Best suited for standard Hindi text; may have reduced performance on heavily colloquial or dialectal variations

Citation

If you use this model in your research, please cite:

@misc{madhav2024hindisentiment,
  author = {Madhav},
  title = {Hindi Sentiment Analysis Model},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/madhav112/hindi-sentiment-analysis}}
}

Author

Madhav

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Special thanks to the HuggingFace team and the open-source community for providing the tools and frameworks that made this model possible.

language: hi tags:

  • hindi
  • sentiment-analysis
  • text-classification
  • bert datasets:
  • hindi-sentiment metrics:
  • accuracy
  • f1
  • precision
  • recall model-index:
  • name: hindi-sentiment-analysis results:
    • task: type: text-classification name: Text Classification dataset: name: Hindi Sentiment type: hindi-sentiment metrics:
      • type: accuracy value: 81.3 name: Test Accuracy
      • type: f1 value: 0.82 name: F1 Score
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for madhav112/hindi-sentiment-analysis

Finetuned
(405)
this model

Dataset used to train madhav112/hindi-sentiment-analysis