File size: 12,974 Bytes
37061fe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
---
pipeline_tag: token-classification
datasets:
- conll2003
metrics:
- precision
- recall
- f1
- accuracy
tags:
- distilbert
---
**task**: `token-classification`
**Backend:** `sagemaker-training`
**Backend args:** `{'instance_type': 'ml.m5.2xlarge', 'supported_instructions': 'avx512'}`
**Number of evaluation samples:** `All dataset`
Fixed parameters:
* **model_name_or_path**: `elastic/distilbert-base-uncased-finetuned-conll03-english`
* **dataset**:
* **path**: `conll2003`
* **eval_split**: `validation`
* **data_keys**: `{'primary': 'tokens'}`
* **ref_keys**: `['ner_tags']`
* **calibration_split**: `train`
* **node_exclusion**: `[]`
* **per_channel**: `False`
* **calibration**:
* **method**: `minmax`
* **num_calibration_samples**: `100`
* **framework**: `onnxruntime`
* **framework_args**:
* **opset**: `11`
* **optimization_level**: `1`
* **aware_training**: `False`
Benchmarked parameters:
* **quantization_approach**: `dynamic`, `static`
* **operators_to_quantize**: `['Add', 'MatMul']`, `['Add']`
# Evaluation
## Non-time metrics
| quantization_approach | operators_to_quantize | | precision (original) | precision (optimized) | | recall (original) | recall (optimized) | | f1 (original) | f1 (optimized) | | accuracy (original) | accuracy (optimized) |
| :-------------------: | :-------------------: | :-: | :------------------: | :-------------------: | :-: | :---------------: | :----------------: | :-: | :-----------: | :------------: | :-: | :-----------------: | :------------------: |
| `dynamic` | `['Add', 'MatMul']` | \| | 0.936 | 0.935 | \| | 0.944 | 0.943 | \| | 0.940 | 0.939 | \| | 0.988 | 0.988 |
| `dynamic` | `['Add']` | \| | 0.936 | 0.936 | \| | 0.944 | 0.944 | \| | 0.940 | 0.940 | \| | 0.988 | 0.988 |
| `static` | `['Add', 'MatMul']` | \| | 0.936 | 0.063 | \| | 0.944 | 0.246 | \| | 0.940 | 0.100 | \| | 0.988 | 0.343 |
| `static` | `['Add']` | \| | 0.936 | 0.050 | \| | 0.944 | 0.160 | \| | 0.940 | 0.076 | \| | 0.988 | 0.311 |
## Time metrics
Time benchmarks were run for 15 seconds per config.
Below, time metrics for batch size = 1, input length = 32.
| quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
| :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
| `dynamic` | `['Add', 'MatMul']` | \| | 46.38 | 9.96 | \| | 21.60 | 100.47 |
| `dynamic` | `['Add']` | \| | 36.59 | 13.98 | \| | 27.33 | 71.60 |
| `static` | `['Add', 'MatMul']` | \| | 33.84 | 14.46 | \| | 29.60 | 69.20 |
| `static` | `['Add']` | \| | 33.23 | 20.11 | \| | 30.13 | 49.73 |
Below, time metrics for batch size = 1, input length = 64.
| quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
| :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
| `dynamic` | `['Add', 'MatMul']` | \| | 58.92 | 19.68 | \| | 17.00 | 50.87 |
| `dynamic` | `['Add']` | \| | 58.59 | 24.81 | \| | 17.13 | 40.33 |
| `static` | `['Add', 'MatMul']` | \| | 51.41 | 29.36 | \| | 19.47 | 34.07 |
| `static` | `['Add']` | \| | 44.22 | 38.56 | \| | 22.67 | 25.93 |
Below, time metrics for batch size = 1, input length = 128.
| quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
| :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
| `dynamic` | `['Add', 'MatMul']` | \| | 72.38 | 36.47 | \| | 13.87 | 27.47 |
| `dynamic` | `['Add']` | \| | 70.21 | 46.30 | \| | 14.27 | 21.60 |
| `static` | `['Add', 'MatMul']` | \| | 70.76 | 48.24 | \| | 14.13 | 20.80 |
| `static` | `['Add']` | \| | 72.47 | 71.10 | \| | 13.80 | 14.07 |
Below, time metrics for batch size = 4, input length = 32.
| quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
| :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
| `dynamic` | `['Add', 'MatMul']` | \| | 69.76 | 38.50 | \| | 14.40 | 26.00 |
| `dynamic` | `['Add']` | \| | 56.02 | 51.32 | \| | 17.87 | 19.53 |
| `static` | `['Add', 'MatMul']` | \| | 55.05 | 46.80 | \| | 18.20 | 21.40 |
| `static` | `['Add']` | \| | 71.03 | 56.82 | \| | 14.13 | 17.67 |
Below, time metrics for batch size = 4, input length = 64.
| quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
| :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
| `dynamic` | `['Add', 'MatMul']` | \| | 119.91 | 61.51 | \| | 8.40 | 16.27 |
| `dynamic` | `['Add']` | \| | 108.43 | 105.65 | \| | 9.27 | 9.47 |
| `static` | `['Add', 'MatMul']` | \| | 119.89 | 86.76 | \| | 8.40 | 11.53 |
| `static` | `['Add']` | \| | 96.99 | 102.03 | \| | 10.33 | 9.87 |
Below, time metrics for batch size = 4, input length = 128.
| quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
| :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
| `dynamic` | `['Add', 'MatMul']` | \| | 219.78 | 123.71 | \| | 4.60 | 8.13 |
| `dynamic` | `['Add']` | \| | 220.13 | 187.21 | \| | 4.60 | 5.40 |
| `static` | `['Add', 'MatMul']` | \| | 186.39 | 176.99 | \| | 5.40 | 5.67 |
| `static` | `['Add']` | \| | 219.57 | 203.71 | \| | 4.60 | 4.93 |
Below, time metrics for batch size = 8, input length = 32.
| quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
| :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
| `dynamic` | `['Add', 'MatMul']` | \| | 118.32 | 59.22 | \| | 8.47 | 16.93 |
| `dynamic` | `['Add']` | \| | 116.52 | 80.17 | \| | 8.60 | 12.53 |
| `static` | `['Add', 'MatMul']` | \| | 116.59 | 83.55 | \| | 8.60 | 12.00 |
| `static` | `['Add']` | \| | 115.81 | 126.53 | \| | 8.67 | 7.93 |
Below, time metrics for batch size = 8, input length = 64.
| quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
| :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
| `dynamic` | `['Add', 'MatMul']` | \| | 172.71 | 117.89 | \| | 5.80 | 8.53 |
| `dynamic` | `['Add']` | \| | 166.05 | 156.99 | \| | 6.07 | 6.40 |
| `static` | `['Add', 'MatMul']` | \| | 215.00 | 148.93 | \| | 4.67 | 6.73 |
| `static` | `['Add']` | \| | 214.55 | 200.16 | \| | 4.67 | 5.00 |
Below, time metrics for batch size = 8, input length = 128.
| quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
| :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
| `dynamic` | `['Add', 'MatMul']` | \| | 403.69 | 307.36 | \| | 2.53 | 3.27 |
| `dynamic` | `['Add']` | \| | 372.85 | 317.53 | \| | 2.73 | 3.20 |
| `static` | `['Add', 'MatMul']` | \| | 352.18 | 320.85 | \| | 2.87 | 3.13 |
| `static` | `['Add']` | \| | 403.55 | 410.17 | \| | 2.53 | 2.47 |
|