metadata
tags:
- fp8
- vllm
Mixtral-8x22B-Instruct-v0.1-FP8
Model Overview
Mixtral-8x22B-Instruct-v0.1 quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
Usage and Creation
Produced using AutoFP8 with calibration samples from ultrachat.
Evaluation
Open LLM Leaderboard evaluation scores
Mixtral-8x22B-Instruct-v0.1 | Mixtral-8x22B-Instruct-v0.1-FP8 (this model) |
|
---|---|---|
arc-c 25-shot |
72.70 | 69.19 |
hellaswag 10-shot |
89.08 | 82.49 |
mmlu 5-shot |
77.77 | 70.61 |
truthfulqa 0-shot |
68.14 | 65.73 |
winogrande 5-shot |
85.16 | 82.63 |
gsm8k 5-shot |
82.03 | 76.57 |
Average Accuracy |
79.15 | 74.53 |
Recovery | 100% | 94.17% |