About Model
- Developed by: NuclearAi
- License: apache-2.0
- Finetuned from model : google/gemma-3-1b-it
Gemma is a family of lightweight, state-of-the-art open models from Google, built using the same research and technology as the Gemini models. However, Gemma lacks in the reasoning capabilities, making it less advanced compared to some other models.
At Nuclear AI, we enhance Gemma’s abilities by leveraging GRPO and providing it with a specialized dataset to improve its reasoning skills. Since this is an experimental model, we have used 150 rows of high-quality data and performed five steps of fine-tuning, which takes around 30 minutes.
When we tested the model, we were truly impressed by its performance! We would love to hear your feedback so we can work on fine-tuning a larger version with more steps and greater computational power.
Installing Libraries
# 1. Install the specific Gemma 3 compatible transformers
pip install --no-deps git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3
# 2. Install Unsloth (adjust based on your environment - e.g., remove [colab-new] if not on Colab)
pip install "unsloth[colab-new]@git+https://github.com/unslothai/unsloth.git"
# 3. Install PyTorch (select command based on your CUDA version from https://pytorch.org/)
# Example for CUDA 12.1:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Example for CPU only:
# pip install torch torchvision torchaudio
# 4. Install accelerate and bitsandbytes
pip install accelerate bitsandbytes
Code To Run
import torch
from unsloth import FastModel
from transformers import TextStreamer
# 1. Model and Tokenizer Loading
max_seq_length = 1024
model_name = "NuclearAi/Nuke_X_Gemma3_1B_Reasoner_Testing"
print(f"Loading model: {model_name}...")
model, tokenizer = FastModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = None, # Let Unsloth choose the best dtype (float16, bf16, float32)
load_in_4bit = False, # Set to True if you want 4-bit quantization
device_map = "auto", # Automatically use GPU if available
)
print("Model loaded.")
# 2. Define Prompt Structure
reasoning_start = "<think>"
reasoning_end = "</think>"
solution_start = "<response>"
solution_end = "</response>"
system_prompt = \
f"""You are given a problem.
Think about the problem and provide your working out.
Place it between {reasoning_start} and {reasoning_end}.
Then, provide your solution between {solution_start}{solution_end}"""
# 3. User Input
user_question = "Write a short story about a cat who learns to fly." # Try another question
# 4. Format Input for Chat Model
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_question},
]
text_input = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True # Important for generation
)
# 5. Tokenize and Prepare for Generation
device = model.device if hasattr(model, 'device') else ('cuda' if torch.cuda.is_available() else 'cpu')
inputs = tokenizer([text_input], return_tensors="pt").to(device)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
# 6. Generate Response
print("\n--- Model Response ---")
with torch.no_grad():
outputs = model.generate(
**inputs,
streamer=streamer,
max_new_tokens=1024,
temperature=0.7,
top_p=0.9,
top_k=50,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
print("\n--- End of Response ---")
Thank you for your support !
Jay Shree Ram 🚩🚩
- Downloads last month
- 87