Gonsoo commited on
Commit
488983b
โ€ข
1 Parent(s): 369258a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -42
README.md CHANGED
@@ -14,48 +14,7 @@ This model is an AWS Neuron compiled version, neuron-cc 2.14, of the Korean fine
14
  ## Model Details
15
 
16
  This model is compiled with neuronx-cc version, 2.14
17
- It can be deployed with [v1.0-hf-tgi-0.0.24-pt-2.1.2-inf-neuronx-py310](https://github.com/aws/deep-learning-containers/releases?q=tgi+AND+neuronx&expanded=true)
18
-
19
-
20
- ## How to Get Started with the Model
21
-
22
- After logging in to Amazon ECR with permission, You can pull the docker image 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0 , downlaod this model and run the command like this example:
23
- ```
24
- docker run \
25
- -p 8080:80 \
26
- -v $(pwd)/data:/data \
27
- --privileged \
28
- 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0 \
29
- --model-id /data/AWS-NeuronCC-2-14-llama-3-Korean-Bllossom-8B
30
- ```
31
- After deployment, you can inference like this
32
- ```
33
- curl 127.0.0.1:8080/generate \
34
- -X POST \
35
- -d '{"inputs":"๋”ฅ๋Ÿฌ๋‹์ด ๋ญ์•ผ?","parameters":{"max_new_tokens":512}}' \
36
- -H 'Content-Type: application/json'
37
- ```
38
- or
39
- ```
40
- curl localhost:8080/v1/chat/completions \
41
- -X POST \
42
- -d '{
43
- "model": "tgi",
44
- "messages": [
45
- {
46
- "role": "system",
47
- "content": "๋‹น์‹ ์€ ์ธ๊ณต์ง€๋Šฅ ์ „๋ฌธ๊ฐ€ ์ž…๋‹ˆ๋‹ค."
48
- },
49
- {
50
- "role": "user",
51
- "content": "๋”ฅ๋Ÿฌ๋‹์ด ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?"
52
- }
53
- ],
54
- "stream": false,
55
- "max_tokens": 512
56
- }' \
57
- -H 'Content-Type: application/json'
58
- ```
59
 
60
  This model can be deployed to Amazon SageMaker Endtpoint with this guide, [S3 ์— ์ €์žฅ๋œ ๋ชจ๋ธ์„ SageMaker INF2 ์— ๋ฐฐํฌํ•˜๊ธฐ](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/notebook/03-deploy-llama-3-neuron-moel-inferentia2-from-S3.ipynb)
61
 
 
14
  ## Model Details
15
 
16
  This model is compiled with neuronx-cc version, 2.14
17
+ It can be deployed with [v1.0-hf-tgi-0.0.24-pt-2.1.2-inf-neuronx-py310](https://github.com/aws/deep-learning-containers/releases?q=tgi+AND+neuronx&expanded=true on SageMaker Endpoint because this inference docker image is only used on SageMaker
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  This model can be deployed to Amazon SageMaker Endtpoint with this guide, [S3 ์— ์ €์žฅ๋œ ๋ชจ๋ธ์„ SageMaker INF2 ์— ๋ฐฐํฌํ•˜๊ธฐ](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/notebook/03-deploy-llama-3-neuron-moel-inferentia2-from-S3.ipynb)
20