GGUF uploads

All popular quant variants uploaded.

Finetune Gemma 2, Llama 3.1, Mistral 2-5x faster with 70% less memory via Unsloth!

Directly quantized 4bit model with bitsandbytes.

We have a Google Colab Tesla T4 notebook for Gemma 2 (2B) here: https://colab.research.google.com/drive/1weTpKOjBZxZJ5PQ-Ql8i6ptAY2x-FWVA?usp=sharing

We have a Google Colab Tesla T4 notebook for Gemma 2 (9B) here: https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing

✨ Finetune for Free

All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

Unsloth supports Free Notebooks Performance Memory use
Llama 3 (8B) ▶️ Start on Colab 2.4x faster 58% less
Gemma 2 (9B) ▶️ Start on Colab 2x faster 63% less
Mistral (9B) ▶️ Start on Colab 2.2x faster 62% less
Phi 3 (mini) ▶️ Start on Colab 2x faster 63% less
TinyLlama ▶️ Start on Colab 3.9x faster 74% less
CodeLlama (34B) A100 ▶️ Start on Colab 1.9x faster 27% less
Mistral (7B) 1xT4 ▶️ Start on Kaggle 5x faster* 62% less
DPO - Zephyr ▶️ Start on Colab 1.9x faster 19% less
Downloads last month
1,045
GGUF
Model size
2.61B params
Architecture
gemma2

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.