Gemma Text Generation
A simple, configurable Python script for generating text using the Gemma-3-1B "Thinking" model from Hugging Face.
Features
- Device auto-detection (CPU, CUDA, MPS)
- Command-line interface for easy configuration
- Reusable text generation function
- Multiple example prompts
- Support for chat-formatted input
Model Information
This project uses vinhnx90/gemma3-1b-thinking
, which is a fine-tuned version of google/gemma-3-1b-it
. The model was trained using TRL with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.
Training Approach
This model was fine-tuned with Reinforcement Learning to enhance reasoning capabilities:
- Used reasoning chains from OpenAI's GSM8K dataset
- Implemented GRPO reward functions
- Based on Will Brown's approach
- Training implementation from Ben Burtenshaw's Colab
The model is available on HuggingFace: vinhnx90/gemma3-1b-thinking
Training Details
- Base Model: google/gemma-3-1b-it
- Library: transformers
- Training Method: GRPO (from DeepSeekMath paper)
- Framework Versions:
- TRL: 0.16.0.dev0
- Transformers: 4.50.0.dev0
- Pytorch: 2.5.1+cu124
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Requirements
torch
transformers
Installation
- Clone this repository or download the script
- Install the required packages:
pip install torch transformers
Usage
Basic Script Mode
Run the script directly to see output from several example prompts:
python gemma_text_gen.py
Command Line Interface
Use command-line arguments to customize execution:
python gemma_text_gen.py --prompt "Write a haiku about programming" --model "vinhnx90/gemma3-1b-thinking" --device cuda --max-tokens 256
Available Arguments
Argument | Description | Default |
---|---|---|
--prompt |
Input text for generation | "If you had a time machine..." |
--model |
Hugging Face model name | "vinhnx90/gemma3-1b-thinking" |
--device |
Computing device (cpu, cuda, mps, or None for auto) | Auto-detect |
--max-tokens |
Maximum number of new tokens to generate | 128 |
Quick Start Example
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="vinhnx90/gemma3-1b-thinking", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
As a Module
You can also import and use the function in your own code:
from gemma_text_gen import generate_text
response = generate_text(
prompt="Explain quantum computing to me like I'm five years old.",
model_name="vinhnx90/gemma3-1b-thinking",
device="cuda",
max_tokens=200
)
print(response)
Full Implementation
import argparse
import torch
from transformers import pipeline
def generate_text(prompt, model_name="vinhnx90/gemma3-1b-thinking", device=None, max_tokens=128):
"""
Generate text using a Hugging Face model with configurable parameters.
Args:
prompt (str): The input prompt to generate text from
model_name (str): Hugging Face model name
device (str): Computing device ('cpu', 'cuda', 'mps', or None for auto-detection)
max_tokens (int): Maximum number of new tokens to generate
Returns:
str: Generated text
"""
# Auto-detect device if not specified
if device is None:
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
# Initialize the pipeline
generator = pipeline("text-generation", model=model_name, device=device)
# Format the input for chat models
formatted_input = [{"role": "user", "content": prompt}]
# Generate the response
output = generator(formatted_input, max_new_tokens=max_tokens, return_full_text=False)[0]
return output["generated_text"]
def main():
# Set up command line arguments
parser = argparse.ArgumentParser(description="Generate text using Hugging Face models")
parser.add_argument("--prompt", type=str, default="If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?",
help="The prompt for text generation")
parser.add_argument("--model", type=str, default="vinhnx90/gemma3-1b-thinking",
help="Hugging Face model name")
parser.add_argument("--device", type=str, choices=["cpu", "cuda", "mps", None], default=None,
help="Computing device (cpu, cuda, mps, or None for auto)")
parser.add_argument("--max-tokens", type=int, default=128,
help="Maximum number of tokens to generate")
args = parser.parse_args()
# Run generation with provided arguments
response = generate_text(args.prompt, args.model, args.device, args.max_tokens)
print(f"Prompt: {args.prompt}\n")
print(f"Response:\n{response}")
if __name__ == "__main__":
# Example usage directly in script
examples = [
"Explain quantum computing to me like I'm five years old.",
"What are three ways to improve my productivity while working from home?",
"Write a short poem about artificial intelligence."
]
print("Running in script mode with examples:\n")
for example in examples:
print("-" * 50)
print(f"Prompt: {example}\n")
response = generate_text(example)
print(f"Response:\n{response}\n")
# Uncomment to run with command line arguments instead
# main()
Citations
Implementation References
- Will Brown's Approach: GitHub Gist
- Ben Burtenshaw's Implementation: Twitter/X Post
GRPO
@article{zhihong2024deepseekmath,
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
year = 2024,
eprint = {arXiv:2402.03300},
}
TRL
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
License
This project is licensed under the same license as the base model.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.