Gemma Causal Language Model (GemmaCausalLM)

This repository contains the configuration and metadata for the GemmaCausalLM model, a powerful causal language model designed for advanced NLP tasks such as text generation, dialogue systems, and autoregressive language modeling.


Model Overview

1. Core Architecture

The GemmaCausalLM combines a robust backbone with an intelligent preprocessor, providing an efficient setup for NLP tasks. Below are its key components:

Backbone (GemmaBackbone):

  • Vocabulary Size: 256,000 tokens.
  • Model Depth: 26 layers.
  • Attention Configuration:
    • Query Heads: 8
    • Key-Value Heads: 4
    • Head Dimension: 256
    • Sliding Window Attention: Enabled (window size: 4096).
  • Dimensions:
    • Hidden Dimension: 2,304
    • Intermediate Dimension: 18,432
  • Normalization and Regularization:
    • Layer Normalization (Epsilon: 1e-6).
    • Post-feedforward and post-attention normalization enabled.
  • Soft Caps:
    • Final Logit Soft Cap: 30.0
    • Attention Logit Soft Cap: 50.0
  • Dropout: Disabled.

Preprocessor (GemmaCausalLMPreprocessor):

  • Tokenizer (GemmaTokenizer):
    • Configuration File: tokenizer.json.
    • Adds BOS (Beginning of Sequence) and EOS (End of Sequence) tokens.
  • Sequence Length: 512.
  • Data Type:
    • Float32 for preprocessor computations.
    • Int32 for tokenized inputs.

Metadata

  • Keras Version: 3.5.0
  • Keras Hub Version: 0.17.0
  • Parameter Count: 2,617,270,528 (2.6 billion parameters).
  • Date Saved: 2024-11-18@13:59:51

This metadata ensures reproducibility and provides insights into the complexity of the model.


Applications

This model is designed for tasks requiring causal language modeling, including but not limited to:

  • Text Generation.
  • Dialogue Systems.
  • Autoregressive NLP tasks.

Model Files

  • Backbone Configuration: The core architecture details for GemmaBackbone.
  • Preprocessor Configuration: Tokenization and sequence preprocessing setup.
  • Tokenizer File: tokenizer.json.
  • Preprocessor File: preprocessor.json.

Setup and Usage

  1. Dependencies: Ensure the following libraries are installed:

    pip install keras keras_hub
    
  2. Model Loading: The model can be loaded as follows:

    from keras_hub.src.models.gemma.gemma_causal_lm import GemmaCausalLM
    
    model = GemmaCausalLM.from_config(config_file="path/to/config.json")
    
  3. Inference: Use the preprocessor to tokenize input text and generate predictions with the model.

    preprocessor = model.get_preprocessor()
    inputs = preprocessor.tokenize("Your input text here.")
    outputs = model.predict(inputs)
    print(outputs)
    

Contributions

Feel free to contribute to this repository by improving configurations, extending functionality, or reporting issues.


License

This project is licensed under the MIT License. See the LICENSE file for details.

Downloads last month
117
Inference Examples
Inference API (serverless) does not yet support keras models for this pipeline type.

Model tree for p2kalita/PolicyLens

Base model

google/gemma-2-2b
Finetuned
(470)
this model

Space using p2kalita/PolicyLens 1