t-w commited on
Commit
c5a7742
1 Parent(s): 75d6bcb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -28,7 +28,7 @@ The name `baku` comes from the Japanese word [`獏/ばく/Baku`](https://ja.wiki
28
 
29
  | Size | Continual Pre-Training | Instruction-Tuning |
30
  | :- | :- | :- |
31
- | 2B | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-instruct) |
32
 
33
  * **Library**
34
 
@@ -71,7 +71,7 @@ model_id = "rinna/gemma-2-baku-2b"
71
  pipeline = transformers.pipeline(
72
  "text-generation",
73
  model=model_id,
74
- model_kwargs={"torch_dtype": torch.bfloat16},
75
  device_map="auto"
76
  )
77
  output = pipeline(
@@ -82,6 +82,9 @@ output = pipeline(
82
  print(output[0]["generated_text"])
83
  ~~~
84
 
 
 
 
85
  ---
86
 
87
  # Tokenization
 
28
 
29
  | Size | Continual Pre-Training | Instruction-Tuning |
30
  | :- | :- | :- |
31
+ | 2B | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-it) |
32
 
33
  * **Library**
34
 
 
71
  pipeline = transformers.pipeline(
72
  "text-generation",
73
  model=model_id,
74
+ model_kwargs={"torch_dtype": torch.bfloat16, "attn_implementation": "eager"},
75
  device_map="auto"
76
  )
77
  output = pipeline(
 
82
  print(output[0]["generated_text"])
83
  ~~~
84
 
85
+ It is recommended to use eager attention when conducting batch inference under bfloat16 precision.
86
+ Currently, Gemma 2 yields NaN values for input sequences with padding when the default attention mechanism (torch.scaled_dot_product_attention) is employed in conjunction with bfloat16.
87
+
88
  ---
89
 
90
  # Tokenization