cszhzleo commited on
Commit
538a7a3
·
verified ·
1 Parent(s): ee14a4f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -3
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ ### environment
5
+ optimum-neuron 0.0.25
6
+
7
+ neuron 2.20.0
8
+
9
+ transformers-neuronx 0.12.313
10
+
11
+ transformers 4.45.2
12
+
13
+
14
+ ### export
15
+ ```
16
+ optimum-cli export neuron --model meta-llama/Llama-3.2-1B-Instruct --batch_size 1 --sequence_length 1024 --num_cores 2 --auto_cast_type fp16 ./models-hf/meta-llama/Llama-3.2-1B-Instruct
17
+
18
+ ```
19
+
20
+ ### run
21
+ ```
22
+ docker run -it --name llama-31 --rm \
23
+ -p 8080:80 \
24
+ -v /home/ec2-user/models-hf/:/models \
25
+ -e HF_MODEL_ID=/models/meta-llama/Llama-3.2-1B-Instruct \
26
+ -e MAX_INPUT_TOKENS=256 \
27
+ -e MAX_TOTAL_TOKENS=4096 \
28
+ -e MAX_BATCH_SIZE=1 \
29
+ -e LOG_LEVEL="info,text_generation_router=debug,text_generation_launcher=debug" \
30
+ --device=/dev/neuron0 \
31
+ neuronx-tgi:latest \
32
+ --model-id /models/meta-llama/Llama-3.2-1B-Instruct \
33
+ --max-batch-size 1 \
34
+ --max-input-tokens 256 \
35
+ --max-total-tokens 1024
36
+
37
+ ```
38
+
39
+ ### test
40
+ ```
41
+ curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
42
+ ```