cszhzleo
/

Meta-Llama-3.2-1B-Instruct-nc2-bs1-token1024-neuron-220

Model card Files Files and versions Community

cszhzleo commited on Oct 24, 2024

Commit

538a7a3

·

verified ·

1 Parent(s): ee14a4f

Update README.md

Files changed (1) hide show

README.md +42 -3

README.md CHANGED Viewed

@@ -1,3 +1,42 @@
----
-license: mit
----

+---
+license: mit
+---
+### environment
+optimum-neuron  0.0.25
+neuron 2.20.0
+transformers-neuronx  0.12.313
+transformers    4.45.2
+### export
+```
+optimum-cli export neuron  --model meta-llama/Llama-3.2-1B-Instruct --batch_size 1 --sequence_length 1024 --num_cores 2 --auto_cast_type fp16  ./models-hf/meta-llama/Llama-3.2-1B-Instruct
+```
+### run
+```
+docker run -it --name llama-31 --rm \
+   -p 8080:80 \
+   -v /home/ec2-user/models-hf/:/models \
+   -e HF_MODEL_ID=/models/meta-llama/Llama-3.2-1B-Instruct \
+   -e MAX_INPUT_TOKENS=256 \
+   -e MAX_TOTAL_TOKENS=4096 \
+   -e MAX_BATCH_SIZE=1 \
+   -e LOG_LEVEL="info,text_generation_router=debug,text_generation_launcher=debug" \
+   --device=/dev/neuron0 \
+   neuronx-tgi:latest \
+   --model-id /models/meta-llama/Llama-3.2-1B-Instruct \
+   --max-batch-size 1 \
+   --max-input-tokens 256 \
+   --max-total-tokens 1024
+```
+### test
+```
+curl 127.0.0.1:8080/generate     -X POST     -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}'     -H 'Content-Type: application/json'
+```