File size: 3,717 Bytes
16bf87a 6fc51b4 16bf87a 6fc51b4 8154c76 6fc51b4 cd30809 6fc51b4 3f8e8ee c42f4bc 6fc51b4 c42f4bc 3f8e8ee 6fc51b4 16bf87a 6fc51b4 16bf87a 6fc51b4 8154c76 6fc51b4 8154c76 6fc51b4 8154c76 6fc51b4 8154c76 6fc51b4 8154c76 6fc51b4 8154c76 6fc51b4 8154c76 16bf87a 8154c76 16bf87a 8154c76 16bf87a 8154c76 16bf87a 8154c76 f5209dc 8154c76 16bf87a 674243d 16bf87a 7e4c9e8 f5209dc 674243d 16bf87a 674243d 16bf87a 6fc51b4 16bf87a 6fc51b4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
---
language: en
license: apache-2.0
datasets:
- amazon_reviews_multi
model-index:
- name: distilbert-base-uncased-finetuned-amazon-reviews
results:
- task:
type: text-classification
name: Text Classification
dataset:
type: amazon-reviews-multi
name: amazon_reviews_multi
split: test
metrics:
- type: accuracy
value: 0.8558
name: Accuracy top2
- type: loss
value: 1.2339
name: Loss
tags:
- generated_from_keras_callback
pipeline_tag: text-classification
---
# Model Card for distilbert-base-uncased-finetuned-amazon-reviews
# Table of Contents
- [Model Card for distilbert-base-uncased-finetuned-amazon-reviews](#model-card-for--model_id-)
- [Table of Contents](#table-of-contents)
- [Model Details](#model-details)
- [Uses](#uses)
- [Fine-tuning hyperparameters](#training-details)
- [Evaluation](#evaluation)
- [Framework versions](#framework-versions)
# Model Details
## Model Description
<!-- Provide a longer summary of what this model is/does. -->
This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on [amazon_reviews_multi](https://huggingface.co/datasets/amazon_reviews_multi) dataset.
This model reaches an accuracy of xxx on the dev set.
- **Model type:** Language model
- **Language(s) (NLP):** en
- **License:** apache-2.0
- **Parent Model:** For more details about DistilBERT, check out [this model card](https://huggingface.co/distilbert-base-uncased).
- **Resources for more information:**
- [Model Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/distilbert#transformers.DistilBertForSequenceClassification)
# Uses
You can use this model directly with a pipeline for text classification.
```
from transformers import pipeline
checkpoint = "amir7d0/distilbert-base-uncased-finetuned-amazon-reviews"
classifier = pipeline("text-classification", model=checkpoint)
classifier(["Replace me by any text you'd like."])
```
and in TensorFlow:
```
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
checkpoint = "amir7d0/distilbert-base-uncased-finetuned-amazon-reviews"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
```
# Training Details
## Training and Evaluation Data
Here is the raw dataset ([amazon_reviews_multi](https://huggingface.co/datasets/amazon_reviews_multi)) we used for finetuning the model.
The dataset contains 200,000, 5,000, and 5,000 reviews in the training, dev, and test sets respectively.
## Fine-tuning hyperparameters
The following hyperparameters were used during training:
+ learning_rate: 2e-05
+ train_batch_size: 16
+ eval_batch_size: 16
+ seed: 42
+ optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+ lr_scheduler_type: linear
+ num_epochs: 5
## Accuracy
The fine-tuned model was evaluated on the test set of `amazon_reviews_multi`.
- Accuracy (exact) is the exact match of the number of stars.
- Accuracy (off-by-1) is the percentage of reviews where the number of stars the model predicts differs by a maximum of 1 from the number given by the human reviewer.
| Split | Accuracy (exact) | Accuracy (off-by-1) |
| -------- | ---------------------- | ------------------- |
| Dev set | 56.96% | 85.50%
| Test set | 57.36% | 85.58%
# Framework versions
- Transformers 4.26.1
- TensorFlow 2.11.0
- Datasets 2.1.0
- Tokenizers 0.13.2 |