Update README.md
Browse files
README.md
CHANGED
@@ -15,13 +15,19 @@ If you have more than 4GB of GPU memory, you can run it at high speed.
|
|
15 |
Because Japanese and Chinese are used a lot during quantization, It is known that Perplexity measured using Japanese data is better than [hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4).
|
16 |
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
```
|
19 |
import torch
|
20 |
from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig
|
21 |
|
22 |
model_id = "dahara1/llama3.1-8b-Instruct-awq"
|
23 |
|
24 |
-
|
25 |
quantization_config = AwqConfig(
|
26 |
bits=4,
|
27 |
fuse_max_seq_len=512, # Note: Update this as per your use-case
|
|
|
15 |
Because Japanese and Chinese are used a lot during quantization, It is known that Perplexity measured using Japanese data is better than [hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4).
|
16 |
|
17 |
|
18 |
+
## セットアップ(setup)
|
19 |
+
```
|
20 |
+
pip install -q --upgrade transformers autoawq accelerate
|
21 |
+
```
|
22 |
+
|
23 |
+
|
24 |
+
## サンプルスクリプト(sample script)
|
25 |
```
|
26 |
import torch
|
27 |
from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig
|
28 |
|
29 |
model_id = "dahara1/llama3.1-8b-Instruct-awq"
|
30 |
|
|
|
31 |
quantization_config = AwqConfig(
|
32 |
bits=4,
|
33 |
fuse_max_seq_len=512, # Note: Update this as per your use-case
|