Update README.md
Browse files
README.md
CHANGED
@@ -20,16 +20,16 @@ This is an INT8 PyTorch model quantized with [Intel® Neural Compressor](https:
|
|
20 |
|
21 |
The original fp32 model comes from the fine-tuned model [Intel/bert-base-uncased-mrpc](https://huggingface.co/Intel/bert-base-uncased-mrpc).
|
22 |
|
23 |
-
The calibration dataloader is the train dataloader. The
|
24 |
|
25 |
-
The linear module **bert.encoder.layer.9.output.dense
|
26 |
|
27 |
### Test result
|
28 |
|
29 |
| |INT8|FP32|
|
30 |
|---|:---:|:---:|
|
31 |
-
| **Accuracy (eval-f1)** |0.
|
32 |
-
| **Model size (MB)** |
|
33 |
|
34 |
### Load with Intel® Neural Compressor:
|
35 |
|
|
|
20 |
|
21 |
The original fp32 model comes from the fine-tuned model [Intel/bert-base-uncased-mrpc](https://huggingface.co/Intel/bert-base-uncased-mrpc).
|
22 |
|
23 |
+
The calibration dataloader is the train dataloader. The calibration sampling size is 1000.
|
24 |
|
25 |
+
The linear module **bert.encoder.layer.9.output.dense** falls back to fp32 to meet the 1% relative accuracy loss.
|
26 |
|
27 |
### Test result
|
28 |
|
29 |
| |INT8|FP32|
|
30 |
|---|:---:|:---:|
|
31 |
+
| **Accuracy (eval-f1)** |0.8959|0.9042|
|
32 |
+
| **Model size (MB)** |119|418|
|
33 |
|
34 |
### Load with Intel® Neural Compressor:
|
35 |
|