edumunozsala's picture
Upload README.md
dcc378b
metadata
language: es
tags:
  - sagemaker
  - ruperta
  - TextClassification
  - SentimentAnalysis
license: apache-2.0
datasets:
  - IMDbreviews_es
model-index: null
name: RuPERTa_base_sentiment_analysis_es
results:
  - task:
      name: Sentiment Analysis
      type: sentiment-analysis
  - dataset:
      name: IMDb Reviews in Spanish
      type: IMDbreviews_es
  - metrics:
      - name: Accuracy,
        type: accuracy,
        value: 0.881866
      - name: F1 Score,
        type: f1,
        value: 0.008272
      - name: Precision,
        type: precision,
        value: 0.858605
      - name: Recall,
        type: recall,
        value: 0.920062
widget:
  - text: >-
      Se trata de una película interesante, con un solido argumento y un gran
      interpretación de su actor principal

Model RuPERTa_base_sentiment_analysis_es

A finetuned model for Sentiment analysis in Spanish

This model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container, The base model is RuPERTa-base (uncased) which is a RoBERTa model trained on a uncased version of big Spanish corpus. It was trained by mrm8488, Manuel Romero.Link to base model

Dataset

The dataset is a collection of movie reviews in Spanish, about 50,000 reviews. The dataset is balanced and provides every review in english, in spanish and the label in both languages.

Sizes of datasets:

  • Train dataset: 42,500
  • Validation dataset: 3,750
  • Test dataset: 3,750

Hyperparameters

{
"epochs": "4",
"train_batch_size": "32",    
"eval_batch_size": "8",
"fp16": "true",
"learning_rate": "3e-05",
"model_name": "\"mrm8488/RuPERTa-base\"",
"sagemaker_container_log_level": "20",
"sagemaker_program": "\"train.py\"",
}

Evaluation results

Accuracy = 0.8629333333333333 F1 Score = 0.8648790746582545 Precision = 0.8479381443298969 Recall = 0.8825107296137339

Test results

Accuracy = 0.8066666666666666 F1 Score = 0.8057862309134743 Precision = 0.7928307854507116 Recall = 0.8191721132897604

Model in action

Usage for Sentiment Analysis

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("edumunozsala/RuPERTa_base_sentiment_analysis_es")
model = AutoModelForSequenceClassification.from_pretrained("edumunozsala/RuPERTa_base_sentiment_analysis_es")

text ="Se trata de una película interesante, con un solido argumento y un gran interpretación de su actor principal"

input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
outputs = model(input_ids)
output = outputs.logits.argmax(1)

Created by Eduardo Muñoz/@edumunozsala