Gemma Text Generation

A simple, configurable Python script for generating text using the Gemma-3-1B "Thinking" model from Hugging Face.

Features

  • Device auto-detection (CPU, CUDA, MPS)
  • Command-line interface for easy configuration
  • Reusable text generation function
  • Multiple example prompts
  • Support for chat-formatted input

Model Information

This project uses vinhnx90/gemma3-1b-thinking, which is a fine-tuned version of google/gemma-3-1b-it. The model was trained using TRL with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.

Training Approach

This model was fine-tuned with Reinforcement Learning to enhance reasoning capabilities:

The model is available on HuggingFace: vinhnx90/gemma3-1b-thinking

Training Details

  • Base Model: google/gemma-3-1b-it
  • Library: transformers
  • Training Method: GRPO (from DeepSeekMath paper)
  • Framework Versions:
    • TRL: 0.16.0.dev0
    • Transformers: 4.50.0.dev0
    • Pytorch: 2.5.1+cu124
    • Datasets: 3.3.2
    • Tokenizers: 0.21.0

Requirements

torch
transformers

Installation

  1. Clone this repository or download the script
  2. Install the required packages:
pip install torch transformers

Usage

Basic Script Mode

Run the script directly to see output from several example prompts:

python gemma_text_gen.py

Command Line Interface

Use command-line arguments to customize execution:

python gemma_text_gen.py --prompt "Write a haiku about programming" --model "vinhnx90/gemma3-1b-thinking" --device cuda --max-tokens 256

Available Arguments

Argument Description Default
--prompt Input text for generation "If you had a time machine..."
--model Hugging Face model name "vinhnx90/gemma3-1b-thinking"
--device Computing device (cpu, cuda, mps, or None for auto) Auto-detect
--max-tokens Maximum number of new tokens to generate 128

Quick Start Example

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="vinhnx90/gemma3-1b-thinking", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

As a Module

You can also import and use the function in your own code:

from gemma_text_gen import generate_text

response = generate_text(
    prompt="Explain quantum computing to me like I'm five years old.",
    model_name="vinhnx90/gemma3-1b-thinking",
    device="cuda",
    max_tokens=200
)

print(response)

Full Implementation

import argparse
import torch
from transformers import pipeline

def generate_text(prompt, model_name="vinhnx90/gemma3-1b-thinking", device=None, max_tokens=128):
    """
    Generate text using a Hugging Face model with configurable parameters.
    
    Args:
        prompt (str): The input prompt to generate text from
        model_name (str): Hugging Face model name
        device (str): Computing device ('cpu', 'cuda', 'mps', or None for auto-detection)
        max_tokens (int): Maximum number of new tokens to generate
    
    Returns:
        str: Generated text
    """
    # Auto-detect device if not specified
    if device is None:
        device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
    
    # Initialize the pipeline
    generator = pipeline("text-generation", model=model_name, device=device)
    
    # Format the input for chat models
    formatted_input = [{"role": "user", "content": prompt}]
    
    # Generate the response
    output = generator(formatted_input, max_new_tokens=max_tokens, return_full_text=False)[0]
    
    return output["generated_text"]

def main():
    # Set up command line arguments
    parser = argparse.ArgumentParser(description="Generate text using Hugging Face models")
    parser.add_argument("--prompt", type=str, default="If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?",
                        help="The prompt for text generation")
    parser.add_argument("--model", type=str, default="vinhnx90/gemma3-1b-thinking",
                        help="Hugging Face model name")
    parser.add_argument("--device", type=str, choices=["cpu", "cuda", "mps", None], default=None,
                        help="Computing device (cpu, cuda, mps, or None for auto)")
    parser.add_argument("--max-tokens", type=int, default=128,
                        help="Maximum number of tokens to generate")
    
    args = parser.parse_args()
    
    # Run generation with provided arguments
    response = generate_text(args.prompt, args.model, args.device, args.max_tokens)
    print(f"Prompt: {args.prompt}\n")
    print(f"Response:\n{response}")

if __name__ == "__main__":
    # Example usage directly in script
    examples = [
        "Explain quantum computing to me like I'm five years old.",
        "What are three ways to improve my productivity while working from home?",
        "Write a short poem about artificial intelligence."
    ]
    
    print("Running in script mode with examples:\n")
    
    for example in examples:
        print("-" * 50)
        print(f"Prompt: {example}\n")
        response = generate_text(example)
        print(f"Response:\n{response}\n")
    
    # Uncomment to run with command line arguments instead
    # main()

Citations

Implementation References

GRPO

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

TRL

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

License

This project is licensed under the same license as the base model.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for vinhnx90/gemma3-1b-thinking

Finetuned
(10)
this model

Dataset used to train vinhnx90/gemma3-1b-thinking

Collection including vinhnx90/gemma3-1b-thinking