Gonsoo
/

AWS-NeuronCC-2-14-llama-3-Korean-Bllossom-8B

Model card Files Files and versions Community

Gonsoo commited on Aug 30, 2024

Commit

488983b

·

verified ·

1 Parent(s): 369258a

Update README.md

Files changed (1) hide show

README.md +1 -42

README.md CHANGED Viewed

@@ -14,48 +14,7 @@ This model is an AWS Neuron compiled version, neuron-cc 2.14, of the Korean fine
 ## Model Details
 This model is compiled with neuronx-cc version, 2.14
-It can be deployed with [v1.0-hf-tgi-0.0.24-pt-2.1.2-inf-neuronx-py310](https://github.com/aws/deep-learning-containers/releases?q=tgi+AND+neuronx&expanded=true)
-## How to Get Started with the Model
-After logging in to Amazon ECR with permission, You can pull the docker image 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0 , downlaod this model and run the command like this example:
-```
-docker run \
--p 8080:80 \
--v $(pwd)/data:/data \
---privileged \
-763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0  \
---model-id /data/AWS-NeuronCC-2-14-llama-3-Korean-Bllossom-8B
-```
-After deployment, you can inference like this
-```
-curl 127.0.0.1:8080/generate \
--X POST \
--d '{"inputs":"딥러닝이 뭐야?","parameters":{"max_new_tokens":512}}' \
--H 'Content-Type: application/json'
-```
-or
-```
-curl localhost:8080/v1/chat/completions \
-    -X POST \
-    -d '{
-"model": "tgi",
-"messages": [
-    {
-    "role": "system",
-    "content": "당신은 인공지능 전문가 입니다."
-    },
-    {
-    "role": "user",
-    "content": "딥러닝이 무엇입니까?"
-    }
-],
-"stream": false,
-"max_tokens": 512
-}' \
-    -H 'Content-Type: application/json'
-```
 This model can be deployed to Amazon SageMaker Endtpoint with this guide, [S3 에 저장된 모델을 SageMaker INF2 에 배포하기](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/notebook/03-deploy-llama-3-neuron-moel-inferentia2-from-S3.ipynb)

 ## Model Details
 This model is compiled with neuronx-cc version, 2.14
+It can be deployed with [v1.0-hf-tgi-0.0.24-pt-2.1.2-inf-neuronx-py310](https://github.com/aws/deep-learning-containers/releases?q=tgi+AND+neuronx&expanded=true on SageMaker Endpoint because this inference docker image is only used on SageMaker
 This model can be deployed to Amazon SageMaker Endtpoint with this guide, [S3 에 저장된 모델을 SageMaker INF2 에 배포하기](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/notebook/03-deploy-llama-3-neuron-moel-inferentia2-from-S3.ipynb)