File size: 1,898 Bytes
6b0d806
d70d94a
6b0d806
d70d94a
 
 
3241dc5
9a17416
d70d94a
 
 
 
 
6b0d806
d70d94a
 
 
67778f7
 
 
d70d94a
6627905
d70d94a
 
 
235263f
d70d94a
235263f
d70d94a
67778f7
d70d94a
 
 
235263f
 
d70d94a
67778f7
d70d94a
 
c07b4ad
 
 
 
d70d94a
67778f7
 
 
 
 
 
 
 
8dc66f1
 
67778f7
 
 
 
8dc66f1
 
67778f7
 
 
 
 
 
 
c07b4ad
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
language: en
license: apache-2.0
tags:
- text-classfication
- int8
- Intel® Neural Compressor
- neural-compressor
- PostTrainingStatic
datasets: 
- mrpc
metrics:
- f1
---

# INT8 BERT base uncased finetuned MRPC

## Post-training static quantization

### PyTorch

This is an INT8  PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).

The original fp32 model comes from the fine-tuned model [Intel/bert-base-uncased-mrpc](https://huggingface.co/Intel/bert-base-uncased-mrpc).

The calibration dataloader is the train dataloader. The calibration sampling size is 1000.

The linear module **bert.encoder.layer.9.output.dense** falls back to fp32 to meet the 1% relative accuracy loss.

#### Test result

|   |INT8|FP32|
|---|:---:|:---:|
| **Accuracy (eval-f1)** |0.8959|0.9042|
| **Model size (MB)**  |119|418|

#### Load with Intel® Neural Compressor:

```python
from optimum.intel import INCModelForSequenceClassification

model_id = "Intel/bert-base-uncased-mrpc-int8-static"
int8_model = INCModelForSequenceClassification.from_pretrained(model_id)
```

### ONNX


This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).

The original fp32 model comes from the fine-tuned model [Intel/bert-base-uncased-mrpc](https://huggingface.co/Intel/bert-base-uncased-mrpc).

The calibration dataloader is the eval dataloader. The calibration sampling size is 100.

#### Test result

|   |INT8|FP32|
|---|:---:|:---:|
| **Accuracy (eval-f1)** |0.9021|0.9042|
| **Model size (MB)**  |236|418|


#### Load ONNX model:

```python
from optimum.onnxruntime import ORTModelForSequenceClassification
model = ORTModelForSequenceClassification.from_pretrained('Intel/bert-base-uncased-mrpc-int8-static')
```