Gemma-7B in 8-bit with bitsandbytes

This is the repository for Gemma-7B quantized to 8-bit using bitsandbytes. Original model card and license for Gemma-7B can be found here. This is the base model and it's not instruction fine-tuned.

Usage

Please visit original Gemma-7B model card for intended uses and limitations.

You can use this model like following:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer("google/gemma-7b")
model = AutoModelForCausalLM.from_pretrained(
  "merve/gemma-7b-8bit",
  device_map='auto'
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.