ss-galileo commited on
Commit
fb7016b
·
verified ·
1 Parent(s): 8d2649f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -3
README.md CHANGED
@@ -1,3 +1,21 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ ```
5
+ apt install git-lfs
6
+ git lfs install
7
+ git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
8
+
9
+
10
+ sudo apt-get update && sudo apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
11
+ pip3 install tensorrt_llm==0.13.0 --extra-index-url https://pypi.nvidia.com
12
+ git clone -b v0.13.0 https://github.com/NVIDIA/TensorRT-LLM.git
13
+ ```
14
+
15
+ int4 awq:
16
+ ```
17
+ python TensorRT-LLM/examples/quantization/quantize.py --model_dir ./Meta-Llama-3.1-8B-Instruct --dtype float16 --qformat int4_awq --batch_size 64 --awq_block_size 128 --output_dir ./tllm_checkpoint_1gpu_int4_awq --calib_size 32
18
+ trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_int4_awq --output_dir ./tmp/llama/8B/trt_engines/int4_awq/1-gpu --gpt_attention_plugin auto --gemm_plugin auto --max_num_tokens 65536 --max_input_len 1048576 --max_batch_size 64
19
+ ```
20
+
21
+ Upload `./tmp/llama/8B/trt_engines/int4_awq/1-gpu/*`