ss-galileo
/

llama3.1-8b

Model card Files Files and versions Community

ss-galileo commited on Oct 14, 2024

Commit

d99c812

·

verified ·

1 Parent(s): 4dcb0a3

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -18,4 +18,10 @@ python TensorRT-LLM/examples/quantization/quantize.py --model_dir ./Meta-Llama-3
 trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_int4_awq --output_dir ./tmp/llama/8B/trt_engines/int4_awq/1-gpu  --gpt_attention_plugin auto  --gemm_plugin auto  --max_num_tokens 65536 --max_input_len 1048576 --max_batch_size 64 --gather_generation_logits
 ```
-Upload `./tmp/llama/8B/trt_engines/int4_awq/1-gpu/*` and the "tokenizer.json", "tokenizer_config.json", "special_tokens_map.json" from meta-llama/Meta-Llama-3.1-8B-Instruct

 trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_int4_awq --output_dir ./tmp/llama/8B/trt_engines/int4_awq/1-gpu  --gpt_attention_plugin auto  --gemm_plugin auto  --max_num_tokens 65536 --max_input_len 1048576 --max_batch_size 64 --gather_generation_logits
 ```
+Upload
+```
+huggingface-cli upload ss-galileo/llama3.1-8b  ./tmp/llama/8B/trt_engines/int4_awq/1-gpu/rank0.engine rank0.engine
+huggingface-cli upload ss-galileo/llama3.1-8b  ./tmp/llama/8B/trt_engines/int4_awq/1-gpu/config.json config.json
+```
+and the "tokenizer.json", "tokenizer_config.json", "special_tokens_map.json" from meta-llama/Meta-Llama-3.1-8B-Instruct