ss-galileo
/

llama3.1-8b

Model card Files Files and versions Community

ss-galileo commited on Oct 11, 2024

Commit

fb7016b

·

verified ·

1 Parent(s): 8d2649f

Update README.md

Files changed (1) hide show

README.md +21 -3

README.md CHANGED Viewed

@@ -1,3 +1,21 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+```
+apt install git-lfs
+git lfs install
+git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
+sudo apt-get update && sudo apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
+pip3 install tensorrt_llm==0.13.0 --extra-index-url https://pypi.nvidia.com
+git clone -b v0.13.0 https://github.com/NVIDIA/TensorRT-LLM.git
+```
+int4 awq:
+```
+python TensorRT-LLM/examples/quantization/quantize.py --model_dir ./Meta-Llama-3.1-8B-Instruct --dtype float16 --qformat int4_awq  --batch_size 64   --awq_block_size 128  --output_dir ./tllm_checkpoint_1gpu_int4_awq   --calib_size 32
+trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_int4_awq --output_dir ./tmp/llama/8B/trt_engines/int4_awq/1-gpu  --gpt_attention_plugin auto  --gemm_plugin auto  --max_num_tokens 65536 --max_input_len 1048576 --max_batch_size 64
+```
+Upload `./tmp/llama/8B/trt_engines/int4_awq/1-gpu/*`