TheFloat16
/

Llama3-70b-Instruct-TRTLLM

Text Generation

Model card Files Files and versions Community

matichon commited on May 24, 2024

Commit

e2dde48

·

verified ·

1 Parent(s): 5d3f32f

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -70,6 +70,9 @@ Give me a detailed list of the attractions I should visit, and time it takes in
 | quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvint8-gs64 | 64 | int8 | 1 | 8 | int4 | fp16 |
 ## Quantize script
 ```
 python  ../quantization/quantize.py --model_dir /root/.cache/huggingface/hub/models--casperhansen--llama-3-70b-fp16/snapshots/c8647dcc2296eb8d763645645ebda784da16141a                                          --dtype float16                                          --qformat int4_awq                                          --awq_block_size 64                                          --output_dir ./quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvfp16-gs64                                          --batch_size 8                                          --tp_size 8                                          --pp_size 1                                          --calib_size 512

 | quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvint8-gs64 | 64 | int8 | 1 | 8 | int4 | fp16 |
+## TRT-LLM and AMMO
+- TRT-LLM rel 0.9 a9356d4b7610330e89c1010f342a9ac644215c52
 ## Quantize script
 ```
 python  ../quantization/quantize.py --model_dir /root/.cache/huggingface/hub/models--casperhansen--llama-3-70b-fp16/snapshots/c8647dcc2296eb8d763645645ebda784da16141a                                          --dtype float16                                          --qformat int4_awq                                          --awq_block_size 64                                          --output_dir ./quantized-llama-3-70b-pp1-tp8-awq-w4a16-kvfp16-gs64                                          --batch_size 8                                          --tp_size 8                                          --pp_size 1                                          --calib_size 512