Gonsoo
/

AWS-NeuronCC-2-14-llama-3-Korean-Bllossom-8B

Model card Files Files and versions Community

Gonsoo commited on Aug 30, 2024

Commit

369258a

·

verified ·

1 Parent(s): 5e6c622

Create README.md

Files changed (1) hide show

README.md +73 -0

README.md ADDED Viewed

	@@ -0,0 +1,73 @@

+---
+license: mit
+language:
+- ko
+- en
+base_model: MLP-KTLim/llama-3-Korean-Bllossom-8B
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+This model is an AWS Neuron compiled version, neuron-cc 2.14, of the Korean fine-tuned model MLP-KTLim/llama-3-Korean-Bllossom-8B, available at https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B. It is intended for deployment on Amazon EC2 Inferentia2 and Amazon SageMaker. For detailed information about the model and its license, please refer to the original MLP-KTLim/llama-3-Korean-Bllossom-8B model page
+## Model Details
+This model is compiled with neuronx-cc version, 2.14
+It can be deployed with [v1.0-hf-tgi-0.0.24-pt-2.1.2-inf-neuronx-py310](https://github.com/aws/deep-learning-containers/releases?q=tgi+AND+neuronx&expanded=true)
+## How to Get Started with the Model
+After logging in to Amazon ECR with permission, You can pull the docker image 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0 , downlaod this model and run the command like this example:
+```
+docker run \
+-p 8080:80 \
+-v $(pwd)/data:/data \
+--privileged \
+763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0  \
+--model-id /data/AWS-NeuronCC-2-14-llama-3-Korean-Bllossom-8B
+```
+After deployment, you can inference like this
+```
+curl 127.0.0.1:8080/generate \
+-X POST \
+-d '{"inputs":"딥러닝이 뭐야?","parameters":{"max_new_tokens":512}}' \
+-H 'Content-Type: application/json'
+```
+or
+```
+curl localhost:8080/v1/chat/completions \
+    -X POST \
+    -d '{
+"model": "tgi",
+"messages": [
+    {
+    "role": "system",
+    "content": "당신은 인공지능 전문가 입니다."
+    },
+    {
+    "role": "user",
+    "content": "딥러닝이 무엇입니까?"
+    }
+],
+"stream": false,
+"max_tokens": 512
+}' \
+    -H 'Content-Type: application/json'
+```
+This model can be deployed to Amazon SageMaker Endtpoint with this guide, [S3 에 저장된 모델을 SageMaker INF2 에 배포하기](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/notebook/03-deploy-llama-3-neuron-moel-inferentia2-from-S3.ipynb)
+In order to do neuron-compilation and depoly in detail , you can refer to [Amazon ECR 의 도커 이미지 기반하에 Amazon EC2 Inferentia2 서빙하기](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/README-NeuronCC-2-14.md)
+## Hardware
+At a minimum hardware, you can use Amazon EC2 inf2.xlarge and more powerful family such as inf2.8xlarge, inf2.24xlarge and inf2.48xlarge.
+The detailed information is [Amazon EC2 Inf2 Instances](https://aws.amazon.com/ec2/instance-types/inf2/)
+## Model Card Contact
+Gonsoo Moon, [email protected]