Update README.md
Browse files
README.md
CHANGED
@@ -70,6 +70,9 @@ Give me a detailed list of the attractions I should visit, and time it takes in
|
|
70 |
| quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvint8-gs64 | 64 | int8 | 1 | 8 | int4 | fp16 |
|
71 |
|
72 |
|
|
|
|
|
|
|
73 |
## Quantize script
|
74 |
```
|
75 |
python ../quantization/quantize.py --model_dir /root/.cache/huggingface/hub/models--casperhansen--llama-3-70b-fp16/snapshots/c8647dcc2296eb8d763645645ebda784da16141a --dtype float16 --qformat int4_awq --awq_block_size 64 --output_dir ./quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvfp16-gs64 --batch_size 8 --tp_size 8 --pp_size 1 --calib_size 512
|
|
|
70 |
| quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvint8-gs64 | 64 | int8 | 1 | 8 | int4 | fp16 |
|
71 |
|
72 |
|
73 |
+
## TRT-LLM and AMMO
|
74 |
+
- TRT-LLM rel 0.9 a9356d4b7610330e89c1010f342a9ac644215c52
|
75 |
+
|
76 |
## Quantize script
|
77 |
```
|
78 |
python ../quantization/quantize.py --model_dir /root/.cache/huggingface/hub/models--casperhansen--llama-3-70b-fp16/snapshots/c8647dcc2296eb8d763645645ebda784da16141a --dtype float16 --qformat int4_awq --awq_block_size 64 --output_dir ./quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvfp16-gs64 --batch_size 8 --tp_size 8 --pp_size 1 --calib_size 512
|