Splend1dchan commited on
Commit
ecf885a
·
verified ·
1 Parent(s): 7b22c9c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -82,7 +82,7 @@ First install direct dependencies:
82
  ```
83
  pip install transformers torch accelerate
84
  ```
85
- If you want faster inference using flash-attention2, you need to install these dependencies:
86
  ```bash
87
  pip install packaging ninja
88
  pip install flash-attn
@@ -92,11 +92,11 @@ Then load the model in transformers:
92
  >>> from transformers import AutoModelForCausalLM, AutoTokenizer
93
  >>> tokenizer = AutoTokenizer.from_pretrained("MediaTek-Research/Breeze-7B-32k-Instruct-v1_0/")
94
  >>> model = AutoModelForCausalLM.from_pretrained(
95
- "MediaTek-Research/Breeze-7B-Instruct-v0_1",
96
- device_map="auto",
97
- torch_dtype=torch.bfloat16,
98
- attn_implementation="flash_attention_2"
99
- )
100
  >>> chat = [
101
  ... {"role": "user", "content": "你好,請問你可以完成什麼任務?"},
102
  ... {"role": "assistant", "content": "你好,我可以幫助您解決各種問題、提供資訊和協助您完成許多不同的任務。例如:回答技術問題、提供建議、翻譯文字、尋找資料或協助您安排行程等。請告訴我如何能幫助您。"},
 
82
  ```
83
  pip install transformers torch accelerate
84
  ```
85
+ <p style="color:red;">Flash-attention2 is strongly recommended for long context scenarios.</p>
86
  ```bash
87
  pip install packaging ninja
88
  pip install flash-attn
 
92
  >>> from transformers import AutoModelForCausalLM, AutoTokenizer
93
  >>> tokenizer = AutoTokenizer.from_pretrained("MediaTek-Research/Breeze-7B-32k-Instruct-v1_0/")
94
  >>> model = AutoModelForCausalLM.from_pretrained(
95
+ >>> "MediaTek-Research/Breeze-7B-32k-Instruct-v1_0",
96
+ ... device_map="auto",
97
+ ... torch_dtype=torch.bfloat16,
98
+ ... attn_implementation="flash_attention_2"
99
+ ... )
100
  >>> chat = [
101
  ... {"role": "user", "content": "你好,請問你可以完成什麼任務?"},
102
  ... {"role": "assistant", "content": "你好,我可以幫助您解決各種問題、提供資訊和協助您完成許多不同的任務。例如:回答技術問題、提供建議、翻譯文字、尋找資料或協助您安排行程等。請告訴我如何能幫助您。"},