Intel
/

bert-base-uncased-mrpc-int8-qat-inc

Text Classification

text-classfication

Intel® Neural Compressor

QuantizationAwareTraining

Inference Endpoints

Model card Files Files and versions Community

xinhe commited on Apr 11, 2022

Commit

b162079

•

1 Parent(s): 6044c0d

Update README.md

Files changed (1) hide show

README.md +37 -42

README.md CHANGED Viewed

@@ -1,59 +1,54 @@
 ---
-language:
-- en
 license: apache-2.0
 tags:
-- generated_from_trainer
-datasets:
-- glue
 metrics:
-- accuracy
 - f1
-model-index:
-- name: bert-base-uncased-mrpc
-  results:
-  - task:
-      name: Text Classification
-      type: text-classification
-    dataset:
-      name: GLUE MRPC
-      type: glue
-      args: mrpc
-    metrics:
-    - name: Accuracy
-      type: accuracy
-      value: 0.8602941176470589
-    - name: F1
-      type: f1
-      value: 0.9042016806722689
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# bert-base-uncased-mrpc
-This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the GLUE MRPC dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.6978
-- Accuracy: 0.8603
-- F1: 0.9042
-- Combined Score: 0.8822
-### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
-- train_batch_size: 16
-- eval_batch_size: 8
-- seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- num_epochs: 5.0
-### Framework versions
-- Transformers 4.17.0
-- Pytorch 1.10.0+cu102
-- Datasets 1.14.0
-- Tokenizers 0.11.6

 ---
+language: en
 license: apache-2.0
 tags:
+- text-classfication
+- int8
+- QuantizationAwareTraining
+datasets:
+- mrpc
 metrics:
 - f1
 ---
+# INT8 BERT base uncased finetuned MRPC
+### QuantizationAwareTraining
+This is an INT8  PyTorch model quantized by [intel/nlp-toolkit](https://github.com/intel/nlp-toolkit) using provider: [Intel® Neural Compressor](https://github.com/intel/neural-compressor). The original fp32 model comes from the fine-tuned model [Intel/bert-base-uncased-mrpc](https://huggingface.co/Intel/bert-base-uncased-mrpc)
+#### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- num_epochs: 3.0
+- train_batch_size: 8
+- eval_batch_size: 8
+- eval_steps: 100
+- load_best_model_at_end: True
+- metric_for_best_model: f1
+- early_stopping_patience = 6
+- early_stopping_threshold = 0.001
+### Test result
+- Batch size = 8
+- [Amazon Web Services](https://aws.amazon.com/) c6i.xlarge (Intel ICE Lake: 4 vCPUs, 8g Memory) instance.
+|   |INT8|FP32|
+|---|:---:|:---:|
+| **Throughput (samples/sec)**  |24.263|11.202|
+| **Accuracy (eval-accuracy)** |0.9153|0.9042|
+| **Model size (MB)**  |174|418|
+### Load with nlp-toolkit:
+```python
+from nlp_toolkit import OptimizedModel
+int8_model = OptimizedModel.from_pretrained(
+    'Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-static',
+)
+```
+Notes:
+ - The INT8 model has better performance than the FP32 model when the CPU is fully occupied. Otherwise, there will be the illusion that INT8 is inferior to FP32.