rinna
/

gemma-2-baku-2b

Text Generation

text-generation-inference

Model card Files Files and versions Community

t-w commited on 3 days ago

Commit

c5a7742

•

1 Parent(s): 75d6bcb

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ The name `baku` comes from the Japanese word [`獏/ばく/Baku`](https://ja.wiki
 | Size | Continual Pre-Training | Instruction-Tuning |
 | :- | :- | :- |
-| 2B | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-instruct) |
 * **Library**
@@ -71,7 +71,7 @@ model_id = "rinna/gemma-2-baku-2b"
 pipeline = transformers.pipeline(
  "text-generation",
  model=model_id,
- model_kwargs={"torch_dtype": torch.bfloat16},
  device_map="auto"
 )
 output = pipeline(
@@ -82,6 +82,9 @@ output = pipeline(
 print(output[0]["generated_text"])
 ~~~
 ---
 # Tokenization

 | Size | Continual Pre-Training | Instruction-Tuning |
 | :- | :- | :- |
+| 2B | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-it) |
 * **Library**
 pipeline = transformers.pipeline(
  "text-generation",
  model=model_id,
+ model_kwargs={"torch_dtype": torch.bfloat16, "attn_implementation": "eager"},
  device_map="auto"
 )
 output = pipeline(
 print(output[0]["generated_text"])
 ~~~
+It is recommended to use eager attention when conducting batch inference under bfloat16 precision.
+Currently, Gemma 2 yields NaN values for input sequences with padding when the default attention mechanism (torch.scaled_dot_product_attention) is employed in conjunction with bfloat16.
 ---
 # Tokenization