Gemma-7B in 8-bit with bitsandbytes

This is the repository for Gemma-7B quantized to 8-bit using bitsandbytes. Original model card and license for Gemma-7B can be found here. This is the base model and it's not instruction fine-tuned.

Usage

Please visit original Gemma-7B model card for intended uses and limitations.

You can use this model like following:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer("google/gemma-7b")
model = AutoModelForCausalLM.from_pretrained(
  "merve/gemma-7b-8bit",
  device_map='auto'
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))