Update README.md
Browse files
README.md
CHANGED
@@ -14,48 +14,7 @@ This model is an AWS Neuron compiled version, neuron-cc 2.14, of the Korean fine
|
|
14 |
## Model Details
|
15 |
|
16 |
This model is compiled with neuronx-cc version, 2.14
|
17 |
-
It can be deployed with [v1.0-hf-tgi-0.0.24-pt-2.1.2-inf-neuronx-py310](https://github.com/aws/deep-learning-containers/releases?q=tgi+AND+neuronx&expanded=true
|
18 |
-
|
19 |
-
|
20 |
-
## How to Get Started with the Model
|
21 |
-
|
22 |
-
After logging in to Amazon ECR with permission, You can pull the docker image 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0 , downlaod this model and run the command like this example:
|
23 |
-
```
|
24 |
-
docker run \
|
25 |
-
-p 8080:80 \
|
26 |
-
-v $(pwd)/data:/data \
|
27 |
-
--privileged \
|
28 |
-
763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0 \
|
29 |
-
--model-id /data/AWS-NeuronCC-2-14-llama-3-Korean-Bllossom-8B
|
30 |
-
```
|
31 |
-
After deployment, you can inference like this
|
32 |
-
```
|
33 |
-
curl 127.0.0.1:8080/generate \
|
34 |
-
-X POST \
|
35 |
-
-d '{"inputs":"๋ฅ๋ฌ๋์ด ๋ญ์ผ?","parameters":{"max_new_tokens":512}}' \
|
36 |
-
-H 'Content-Type: application/json'
|
37 |
-
```
|
38 |
-
or
|
39 |
-
```
|
40 |
-
curl localhost:8080/v1/chat/completions \
|
41 |
-
-X POST \
|
42 |
-
-d '{
|
43 |
-
"model": "tgi",
|
44 |
-
"messages": [
|
45 |
-
{
|
46 |
-
"role": "system",
|
47 |
-
"content": "๋น์ ์ ์ธ๊ณต์ง๋ฅ ์ ๋ฌธ๊ฐ ์
๋๋ค."
|
48 |
-
},
|
49 |
-
{
|
50 |
-
"role": "user",
|
51 |
-
"content": "๋ฅ๋ฌ๋์ด ๋ฌด์์
๋๊น?"
|
52 |
-
}
|
53 |
-
],
|
54 |
-
"stream": false,
|
55 |
-
"max_tokens": 512
|
56 |
-
}' \
|
57 |
-
-H 'Content-Type: application/json'
|
58 |
-
```
|
59 |
|
60 |
This model can be deployed to Amazon SageMaker Endtpoint with this guide, [S3 ์ ์ ์ฅ๋ ๋ชจ๋ธ์ SageMaker INF2 ์ ๋ฐฐํฌํ๊ธฐ](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/notebook/03-deploy-llama-3-neuron-moel-inferentia2-from-S3.ipynb)
|
61 |
|
|
|
14 |
## Model Details
|
15 |
|
16 |
This model is compiled with neuronx-cc version, 2.14
|
17 |
+
It can be deployed with [v1.0-hf-tgi-0.0.24-pt-2.1.2-inf-neuronx-py310](https://github.com/aws/deep-learning-containers/releases?q=tgi+AND+neuronx&expanded=true on SageMaker Endpoint because this inference docker image is only used on SageMaker
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
This model can be deployed to Amazon SageMaker Endtpoint with this guide, [S3 ์ ์ ์ฅ๋ ๋ชจ๋ธ์ SageMaker INF2 ์ ๋ฐฐํฌํ๊ธฐ](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/notebook/03-deploy-llama-3-neuron-moel-inferentia2-from-S3.ipynb)
|
20 |
|