File size: 1,640 Bytes
2ae5589 98b631e 2ae5589 eb86576 7b86662 eb86576 7b86662 eb86576 ee0911f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
---
tags:
- fp8
- vllm
---
Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
Produced using [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py).
Accuracy on MMLU:
```
vllm (pretrained=meta-llama/Meta-Llama-3-8B-Instruct,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu |N/A |none | 0|acc |0.6569|± |0.0038|
| - humanities |N/A |none | 5|acc |0.6049|± |0.0068|
| - other |N/A |none | 5|acc |0.7203|± |0.0078|
| - social_sciences|N/A |none | 5|acc |0.7663|± |0.0075|
| - stem |N/A |none | 5|acc |0.5652|± |0.0085|
vllm (pretrained=nm-testing/Meta-Llama-3-8B-Instruct-FP8,quantization=fp8,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu |N/A |none | 0|acc |0.6567|± |0.0038|
| - humanities |N/A |none | 5|acc |0.6072|± |0.0068|
| - other |N/A |none | 5|acc |0.7206|± |0.0078|
| - social_sciences|N/A |none | 5|acc |0.7618|± |0.0075|
| - stem |N/A |none | 5|acc |0.5649|± |0.0085|
``` |