MediaTek-Research
/

Breeze-7B-32k-Instruct-v1_0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Splend1dchan commited on Apr 24, 2024

Commit

ecf885a

·

verified ·

1 Parent(s): 7b22c9c

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -82,7 +82,7 @@ First install direct dependencies:
 ```
 pip install transformers torch accelerate
 ```
-If you want faster inference using flash-attention2, you need to install these dependencies:
 ```bash
 pip install packaging ninja
 pip install flash-attn
@@ -92,11 +92,11 @@ Then load the model in transformers:
 >>> from transformers import AutoModelForCausalLM, AutoTokenizer
 >>> tokenizer = AutoTokenizer.from_pretrained("MediaTek-Research/Breeze-7B-32k-Instruct-v1_0/")
 >>> model = AutoModelForCausalLM.from_pretrained(
-    "MediaTek-Research/Breeze-7B-Instruct-v0_1",
-    device_map="auto",
-    torch_dtype=torch.bfloat16,
-    attn_implementation="flash_attention_2"
-)
 >>> chat = [
 ...   {"role": "user", "content": "你好，請問你可以完成什麼任務？"},
 ...   {"role": "assistant", "content": "你好，我可以幫助您解決各種問題、提供資訊和協助您完成許多不同的任務。例如：回答技術問題、提供建議、翻譯文字、尋找資料或協助您安排行程等。請告訴我如何能幫助您。"},

 ```
 pip install transformers torch accelerate
 ```
+<p style="color:red;">Flash-attention2 is strongly recommended for long context scenarios.</p>
 ```bash
 pip install packaging ninja
 pip install flash-attn
 >>> from transformers import AutoModelForCausalLM, AutoTokenizer
 >>> tokenizer = AutoTokenizer.from_pretrained("MediaTek-Research/Breeze-7B-32k-Instruct-v1_0/")
 >>> model = AutoModelForCausalLM.from_pretrained(
+>>>    "MediaTek-Research/Breeze-7B-32k-Instruct-v1_0",
+...    device_map="auto",
+...    torch_dtype=torch.bfloat16,
+...    attn_implementation="flash_attention_2"
+... )
 >>> chat = [
 ...   {"role": "user", "content": "你好，請問你可以完成什麼任務？"},
 ...   {"role": "assistant", "content": "你好，我可以幫助您解決各種問題、提供資訊和協助您完成許多不同的任務。例如：回答技術問題、提供建議、翻譯文字、尋找資料或協助您安排行程等。請告訴我如何能幫助您。"},