metadata
language: en
license: apache-2.0
datasets:
- amazon_reviews_multi
model-index:
- name: distilbert-base-uncased-finetuned-amazon-reviews
results:
- task:
type: text-classification
name: Text Classification
dataset:
type: amazon-reviews-multi
name: amazon_reviews_multi
split: test
metrics:
- type: accuracy
value: 0.8558
name: Accuracy top2
- type: loss
value: 1.2339
name: Loss
tags:
- generated_from_keras_callback
pipeline_tag: text-classification
Model Card for distilbert-base-uncased-finetuned-amazon-reviews
Table of Contents
- Model Card for distilbert-base-uncased-finetuned-amazon-reviews
- Table of Contents
- Model Details
- Uses
- Fine-tuning hyperparameters
- Evaluation
- Framework versions
Model Details
Model Description
This model is a fine-tuned version of distilbert-base-uncased on amazon_reviews_multi dataset. This model reaches an accuracy of xxx on the dev set.
- Model type: Language model
- Language(s) (NLP): en
- License: apache-2.0
- Parent Model: For more details about DistilBERT, check out this model card.
- Resources for more information:
Uses
You can use this model directly with a pipeline for text classification.
from transformers import pipeline
checkpoint = "amir7d0/distilbert-base-uncased-finetuned-amazon-reviews"
classifier = pipeline("text-classification", model=checkpoint)
classifier(["Replace me by any text you'd like."])
and in TensorFlow:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
checkpoint = "amir7d0/distilbert-base-uncased-finetuned-amazon-reviews"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
Training Details
Training and Evaluation Data
Here is the raw dataset (amazon_reviews_multi) we used for finetuning the model. The dataset contains 200,000, 5,000, and 5,000 reviews in the training, dev, and test sets respectively.
Fine-tuning hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 5
Accuracy
The finetuned model was evaluated on the French test set of amazon_reviews_multi
.
- Accuracy (exact) is the exact match on the number of stars.
- Accuracy (off-by-1) is the percentage of reviews where the number of stars the model predicts differs by a maximum of 1 from the number given by the human reviewer.
Split | Accuracy (exact) | Accuracy (off-by-1) |
---|---|---|
Dev set | 56.96% | 85.50% |
Test set | 57.36% | 85.58% |
Framework versions
- Transformers 4.26.1
- TensorFlow 2.11.0
- Datasets 2.1.0
- Tokenizers 0.13.2