mgoin commited on
Commit
7b86662
1 Parent(s): d799a09

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -6
README.md CHANGED
@@ -4,13 +4,9 @@ tags:
4
  ---
5
 
6
 
7
- Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.4.3.
8
 
9
- Produced using https://github.com/neuralmagic/AutoFP8/blob/b0c1f789c51659bb023c06521ecbd04cea4a26f6/quantize.py
10
-
11
- ```bash
12
- python quantize.py --model-id meta-llama/Meta-Llama-3-8B-Instruct --save-dir Meta-Llama-3-8B-Instruct-FP8
13
- ```
14
 
15
  Accuracy on MMLU:
16
  ```
 
4
  ---
5
 
6
 
7
+ Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
8
 
9
+ Produced using [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py).
 
 
 
 
10
 
11
  Accuracy on MMLU:
12
  ```