dahara1
/

llama3.1-8b-Instruct-awq

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

dahara1 commited on Jul 27, 2024

Commit

ccb8e1b

·

verified ·

1 Parent(s): 0fe87f1

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -15,13 +15,19 @@ If you have more than 4GB of GPU memory, you can run it at high speed.
 Because Japanese and Chinese are used a lot during quantization, It is known that Perplexity measured using Japanese data is better than [hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4).
 ```
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig
 model_id = "dahara1/llama3.1-8b-Instruct-awq"
 quantization_config = AwqConfig(
     bits=4,
     fuse_max_seq_len=512, # Note: Update this as per your use-case

 Because Japanese and Chinese are used a lot during quantization, It is known that Perplexity measured using Japanese data is better than [hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4).
+## セットアップ(setup)
+```
+pip install -q --upgrade transformers autoawq accelerate
+```
+## サンプルスクリプト(sample script)
 ```
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig
 model_id = "dahara1/llama3.1-8b-Instruct-awq"
 quantization_config = AwqConfig(
     bits=4,
     fuse_max_seq_len=512, # Note: Update this as per your use-case