metadata

tags:
  - fp8
  - vllm

Meta-Llama-3-8B-Instruct-FP8

Model Overview

Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.

	Meta-Llama-3-8B-Instruct	Meta-Llama-3-8B-Instruct-FP8 (this model)
arc-c 25-shot	62.54	61.77
hellaswag 10-shot	78.83	78.56
mmlu 5-shot	66.60	66.27
truthfulqa 0-shot	52.44	52.35
winogrande 5-shot	75.93	76.4
gsm8k 5-shot	75.96	73.99
Average Accuracy	68.71	68.22
Recovery	100%	99.28%