dahara1 commited on
Commit
ccb8e1b
·
verified ·
1 Parent(s): 0fe87f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -15,13 +15,19 @@ If you have more than 4GB of GPU memory, you can run it at high speed.  
15
  Because Japanese and Chinese are used a lot during quantization, It is known that Perplexity measured using Japanese data is better than [hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4).
16
 
17
 
 
 
 
 
 
 
 
18
  ```
19
  import torch
20
  from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig
21
 
22
  model_id = "dahara1/llama3.1-8b-Instruct-awq"
23
 
24
-
25
  quantization_config = AwqConfig(
26
  bits=4,
27
  fuse_max_seq_len=512, # Note: Update this as per your use-case
 
15
  Because Japanese and Chinese are used a lot during quantization, It is known that Perplexity measured using Japanese data is better than [hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4).
16
 
17
 
18
+ ## セットアップ(setup)
19
+ ```
20
+ pip install -q --upgrade transformers autoawq accelerate
21
+ ```
22
+
23
+
24
+ ## サンプルスクリプト(sample script)
25
  ```
26
  import torch
27
  from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig
28
 
29
  model_id = "dahara1/llama3.1-8b-Instruct-awq"
30
 
 
31
  quantization_config = AwqConfig(
32
  bits=4,
33
  fuse_max_seq_len=512, # Note: Update this as per your use-case