Gemma Text Generation

A simple, configurable Python script for generating text using the Gemma-3-1B "Thinking" model from Hugging Face.

Features

Device auto-detection (CPU, CUDA, MPS)
Command-line interface for easy configuration
Reusable text generation function
Multiple example prompts
Support for chat-formatted input

Model Information

This project uses vinhnx90/gemma3-1b-thinking, which is a fine-tuned version of google/gemma-3-1b-it. The model was trained using TRL with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.

Training Approach

This model was fine-tuned with Reinforcement Learning to enhance reasoning capabilities:

Used reasoning chains from OpenAI's GSM8K dataset
Implemented GRPO reward functions
Based on Will Brown's approach
Training implementation from Ben Burtenshaw's Colab

The model is available on HuggingFace: vinhnx90/gemma3-1b-thinking

Training Details

Base Model: google/gemma-3-1b-it
Library: transformers
Training Method: GRPO (from DeepSeekMath paper)
Framework Versions:
- TRL: 0.16.0.dev0
- Transformers: 4.50.0.dev0
- Pytorch: 2.5.1+cu124
- Datasets: 3.3.2
- Tokenizers: 0.21.0

Requirements

torch
transformers

Installation

Clone this repository or download the script
Install the required packages:

pip install torch transformers

Usage

Basic Script Mode

Run the script directly to see output from several example prompts:

python gemma_text_gen.py

Command Line Interface

Use command-line arguments to customize execution:

python gemma_text_gen.py --prompt "Write a haiku about programming" --model "vinhnx90/gemma3-1b-thinking" --device cuda --max-tokens 256

Available Arguments

Argument	Description	Default
`--prompt`	Input text for generation	"If you had a time machine..."
`--model`	Hugging Face model name	"vinhnx90/gemma3-1b-thinking"
`--device`	Computing device (cpu, cuda, mps, or None for auto)	Auto-detect
`--max-tokens`	Maximum number of new tokens to generate	128

Quick Start Example

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="vinhnx90/gemma3-1b-thinking", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

As a Module

You can also import and use the function in your own code:

from gemma_text_gen import generate_text

response = generate_text(
    prompt="Explain quantum computing to me like I'm five years old.",
    model_name="vinhnx90/gemma3-1b-thinking",
    device="cuda",
    max_tokens=200
)

print(response)

Full Implementation

import argparse
import torch
from transformers import pipeline

def generate_text(prompt, model_name="vinhnx90/gemma3-1b-thinking", device=None, max_tokens=128):
    """
    Generate text using a Hugging Face model with configurable parameters.
    
    Args:
        prompt (str): The input prompt to generate text from
        model_name (str): Hugging Face model name
        device (str): Computing device ('cpu', 'cuda', 'mps', or None for auto-detection)
        max_tokens (int): Maximum number of new tokens to generate
    
    Returns:
        str: Generated text
    """
    # Auto-detect device if not specified
    if device is None:
        device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
    
    # Initialize the pipeline
    generator = pipeline("text-generation", model=model_name, device=device)
    
    # Format the input for chat models
    formatted_input = [{"role": "user", "content": prompt}]
    
    # Generate the response
    output = generator(formatted_input, max_new_tokens=max_tokens, return_full_text=False)[0]
    
    return output["generated_text"]

def main():
    # Set up command line arguments
    parser = argparse.ArgumentParser(description="Generate text using Hugging Face models")
    parser.add_argument("--prompt", type=str, default="If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?",
                        help="The prompt for text generation")
    parser.add_argument("--model", type=str, default="vinhnx90/gemma3-1b-thinking",
                        help="Hugging Face model name")
    parser.add_argument("--device", type=str, choices=["cpu", "cuda", "mps", None], default=None,
                        help="Computing device (cpu, cuda, mps, or None for auto)")
    parser.add_argument("--max-tokens", type=int, default=128,
                        help="Maximum number of tokens to generate")
    
    args = parser.parse_args()
    
    # Run generation with provided arguments
    response = generate_text(args.prompt, args.model, args.device, args.max_tokens)
    print(f"Prompt: {args.prompt}\n")
    print(f"Response:\n{response}")

if __name__ == "__main__":
    # Example usage directly in script
    examples = [
        "Explain quantum computing to me like I'm five years old.",
        "What are three ways to improve my productivity while working from home?",
        "Write a short poem about artificial intelligence."
    ]
    
    print("Running in script mode with examples:\n")
    
    for example in examples:
        print("-" * 50)
        print(f"Prompt: {example}\n")
        response = generate_text(example)
        print(f"Response:\n{response}\n")
    
    # Uncomment to run with command line arguments instead
    # main()

Citations

Implementation References

Will Brown's Approach: GitHub Gist
Ben Burtenshaw's Implementation: Twitter/X Post

GRPO

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

TRL

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

License

This project is licensed under the same license as the base model.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

vinhnx90
/

gemma3-1b-thinking