Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- ko
|
5 |
+
- en
|
6 |
+
base_model: MLP-KTLim/llama-3-Korean-Bllossom-8B
|
7 |
+
---
|
8 |
+
# Model Card for Model ID
|
9 |
+
|
10 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
+
|
12 |
+
This model is an AWS Neuron compiled version, neuron-cc 2.14, of the Korean fine-tuned model MLP-KTLim/llama-3-Korean-Bllossom-8B, available at https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B. It is intended for deployment on Amazon EC2 Inferentia2 and Amazon SageMaker. For detailed information about the model and its license, please refer to the original MLP-KTLim/llama-3-Korean-Bllossom-8B model page
|
13 |
+
|
14 |
+
## Model Details
|
15 |
+
|
16 |
+
This model is compiled with neuronx-cc version, 2.14
|
17 |
+
It can be deployed with [v1.0-hf-tgi-0.0.24-pt-2.1.2-inf-neuronx-py310](https://github.com/aws/deep-learning-containers/releases?q=tgi+AND+neuronx&expanded=true)
|
18 |
+
|
19 |
+
|
20 |
+
## How to Get Started with the Model
|
21 |
+
|
22 |
+
After logging in to Amazon ECR with permission, You can pull the docker image 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0 , downlaod this model and run the command like this example:
|
23 |
+
```
|
24 |
+
docker run \
|
25 |
+
-p 8080:80 \
|
26 |
+
-v $(pwd)/data:/data \
|
27 |
+
--privileged \
|
28 |
+
763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0 \
|
29 |
+
--model-id /data/AWS-NeuronCC-2-14-llama-3-Korean-Bllossom-8B
|
30 |
+
```
|
31 |
+
After deployment, you can inference like this
|
32 |
+
```
|
33 |
+
curl 127.0.0.1:8080/generate \
|
34 |
+
-X POST \
|
35 |
+
-d '{"inputs":"๋ฅ๋ฌ๋์ด ๋ญ์ผ?","parameters":{"max_new_tokens":512}}' \
|
36 |
+
-H 'Content-Type: application/json'
|
37 |
+
```
|
38 |
+
or
|
39 |
+
```
|
40 |
+
curl localhost:8080/v1/chat/completions \
|
41 |
+
-X POST \
|
42 |
+
-d '{
|
43 |
+
"model": "tgi",
|
44 |
+
"messages": [
|
45 |
+
{
|
46 |
+
"role": "system",
|
47 |
+
"content": "๋น์ ์ ์ธ๊ณต์ง๋ฅ ์ ๋ฌธ๊ฐ ์
๋๋ค."
|
48 |
+
},
|
49 |
+
{
|
50 |
+
"role": "user",
|
51 |
+
"content": "๋ฅ๋ฌ๋์ด ๋ฌด์์
๋๊น?"
|
52 |
+
}
|
53 |
+
],
|
54 |
+
"stream": false,
|
55 |
+
"max_tokens": 512
|
56 |
+
}' \
|
57 |
+
-H 'Content-Type: application/json'
|
58 |
+
```
|
59 |
+
|
60 |
+
This model can be deployed to Amazon SageMaker Endtpoint with this guide, [S3 ์ ์ ์ฅ๋ ๋ชจ๋ธ์ SageMaker INF2 ์ ๋ฐฐํฌํ๊ธฐ](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/notebook/03-deploy-llama-3-neuron-moel-inferentia2-from-S3.ipynb)
|
61 |
+
|
62 |
+
In order to do neuron-compilation and depoly in detail , you can refer to [Amazon ECR ์ ๋์ปค ์ด๋ฏธ์ง ๊ธฐ๋ฐํ์ Amazon EC2 Inferentia2 ์๋นํ๊ธฐ](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/README-NeuronCC-2-14.md)
|
63 |
+
|
64 |
+
|
65 |
+
## Hardware
|
66 |
+
|
67 |
+
At a minimum hardware, you can use Amazon EC2 inf2.xlarge and more powerful family such as inf2.8xlarge, inf2.24xlarge and inf2.48xlarge.
|
68 |
+
The detailed information is [Amazon EC2 Inf2 Instances](https://aws.amazon.com/ec2/instance-types/inf2/)
|
69 |
+
|
70 |
+
|
71 |
+
## Model Card Contact
|
72 |
+
|
73 |
+
Gonsoo Moon, [email protected]
|