Gonsoo commited on
Commit
369258a
โ€ข
1 Parent(s): 5e6c622

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ko
5
+ - en
6
+ base_model: MLP-KTLim/llama-3-Korean-Bllossom-8B
7
+ ---
8
+ # Model Card for Model ID
9
+
10
+ <!-- Provide a quick summary of what the model is/does. -->
11
+
12
+ This model is an AWS Neuron compiled version, neuron-cc 2.14, of the Korean fine-tuned model MLP-KTLim/llama-3-Korean-Bllossom-8B, available at https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B. It is intended for deployment on Amazon EC2 Inferentia2 and Amazon SageMaker. For detailed information about the model and its license, please refer to the original MLP-KTLim/llama-3-Korean-Bllossom-8B model page
13
+
14
+ ## Model Details
15
+
16
+ This model is compiled with neuronx-cc version, 2.14
17
+ It can be deployed with [v1.0-hf-tgi-0.0.24-pt-2.1.2-inf-neuronx-py310](https://github.com/aws/deep-learning-containers/releases?q=tgi+AND+neuronx&expanded=true)
18
+
19
+
20
+ ## How to Get Started with the Model
21
+
22
+ After logging in to Amazon ECR with permission, You can pull the docker image 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0 , downlaod this model and run the command like this example:
23
+ ```
24
+ docker run \
25
+ -p 8080:80 \
26
+ -v $(pwd)/data:/data \
27
+ --privileged \
28
+ 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04-v1.0 \
29
+ --model-id /data/AWS-NeuronCC-2-14-llama-3-Korean-Bllossom-8B
30
+ ```
31
+ After deployment, you can inference like this
32
+ ```
33
+ curl 127.0.0.1:8080/generate \
34
+ -X POST \
35
+ -d '{"inputs":"๋”ฅ๋Ÿฌ๋‹์ด ๋ญ์•ผ?","parameters":{"max_new_tokens":512}}' \
36
+ -H 'Content-Type: application/json'
37
+ ```
38
+ or
39
+ ```
40
+ curl localhost:8080/v1/chat/completions \
41
+ -X POST \
42
+ -d '{
43
+ "model": "tgi",
44
+ "messages": [
45
+ {
46
+ "role": "system",
47
+ "content": "๋‹น์‹ ์€ ์ธ๊ณต์ง€๋Šฅ ์ „๋ฌธ๊ฐ€ ์ž…๋‹ˆ๋‹ค."
48
+ },
49
+ {
50
+ "role": "user",
51
+ "content": "๋”ฅ๋Ÿฌ๋‹์ด ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?"
52
+ }
53
+ ],
54
+ "stream": false,
55
+ "max_tokens": 512
56
+ }' \
57
+ -H 'Content-Type: application/json'
58
+ ```
59
+
60
+ This model can be deployed to Amazon SageMaker Endtpoint with this guide, [S3 ์— ์ €์žฅ๋œ ๋ชจ๋ธ์„ SageMaker INF2 ์— ๋ฐฐํฌํ•˜๊ธฐ](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/notebook/03-deploy-llama-3-neuron-moel-inferentia2-from-S3.ipynb)
61
+
62
+ In order to do neuron-compilation and depoly in detail , you can refer to [Amazon ECR ์˜ ๋„์ปค ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ํ•˜์— Amazon EC2 Inferentia2 ์„œ๋น™ํ•˜๊ธฐ](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/neuron/hf-optimum/04-Deploy-Llama3-8B-HF-TGI-Docker-On-INF2/README-NeuronCC-2-14.md)
63
+
64
+
65
+ ## Hardware
66
+
67
+ At a minimum hardware, you can use Amazon EC2 inf2.xlarge and more powerful family such as inf2.8xlarge, inf2.24xlarge and inf2.48xlarge.
68
+ The detailed information is [Amazon EC2 Inf2 Instances](https://aws.amazon.com/ec2/instance-types/inf2/)
69
+
70
+
71
+ ## Model Card Contact
72
+
73
+ Gonsoo Moon, [email protected]