Update README.md
Browse files
README.md
CHANGED
@@ -18,4 +18,10 @@ python TensorRT-LLM/examples/quantization/quantize.py --model_dir ./Meta-Llama-3
|
|
18 |
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_int4_awq --output_dir ./tmp/llama/8B/trt_engines/int4_awq/1-gpu --gpt_attention_plugin auto --gemm_plugin auto --max_num_tokens 65536 --max_input_len 1048576 --max_batch_size 64 --gather_generation_logits
|
19 |
```
|
20 |
|
21 |
-
Upload
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_int4_awq --output_dir ./tmp/llama/8B/trt_engines/int4_awq/1-gpu --gpt_attention_plugin auto --gemm_plugin auto --max_num_tokens 65536 --max_input_len 1048576 --max_batch_size 64 --gather_generation_logits
|
19 |
```
|
20 |
|
21 |
+
Upload
|
22 |
+
```
|
23 |
+
huggingface-cli upload ss-galileo/llama3.1-8b ./tmp/llama/8B/trt_engines/int4_awq/1-gpu/rank0.engine rank0.engine
|
24 |
+
huggingface-cli upload ss-galileo/llama3.1-8b ./tmp/llama/8B/trt_engines/int4_awq/1-gpu/config.json config.json
|
25 |
+
```
|
26 |
+
|
27 |
+
and the "tokenizer.json", "tokenizer_config.json", "special_tokens_map.json" from meta-llama/Meta-Llama-3.1-8B-Instruct
|