p2kalita commited on
Commit
34bf858
verified
1 Parent(s): fe6c06f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -3
README.md CHANGED
@@ -1,3 +1,124 @@
1
- ---
2
- license: gemma
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - en
5
+ - hi
6
+ base_model:
7
+ - google/gemma-2-2b
8
+ pipeline_tag: question-answering
9
+ library_name: keras
10
+ tags:
11
+ - legal
12
+ ---
13
+
14
+ # Gemma Causal Language Model (GemmaCausalLM)
15
+
16
+ This repository contains the configuration and metadata for the `GemmaCausalLM` model, a powerful causal language model designed for advanced NLP tasks such as text generation, dialogue systems, and autoregressive language modeling.
17
+
18
+ ---
19
+
20
+ ## Model Overview
21
+
22
+ ### **1. Core Architecture**
23
+ The `GemmaCausalLM` combines a robust backbone with an intelligent preprocessor, providing an efficient setup for NLP tasks. Below are its key components:
24
+
25
+ #### **Backbone (`GemmaBackbone`):**
26
+ - **Vocabulary Size**: 256,000 tokens.
27
+ - **Model Depth**: 26 layers.
28
+ - **Attention Configuration**:
29
+ - Query Heads: 8
30
+ - Key-Value Heads: 4
31
+ - Head Dimension: 256
32
+ - Sliding Window Attention: Enabled (window size: 4096).
33
+ - **Dimensions**:
34
+ - Hidden Dimension: 2,304
35
+ - Intermediate Dimension: 18,432
36
+ - **Normalization and Regularization**:
37
+ - Layer Normalization (Epsilon: 1e-6).
38
+ - Post-feedforward and post-attention normalization enabled.
39
+ - **Soft Caps**:
40
+ - Final Logit Soft Cap: 30.0
41
+ - Attention Logit Soft Cap: 50.0
42
+ - **Dropout**: Disabled.
43
+
44
+ #### **Preprocessor (`GemmaCausalLMPreprocessor`):**
45
+ - **Tokenizer (`GemmaTokenizer`)**:
46
+ - Configuration File: `tokenizer.json`.
47
+ - Adds BOS (Beginning of Sequence) and EOS (End of Sequence) tokens.
48
+ - **Sequence Length**: 512.
49
+ - **Data Type**:
50
+ - Float32 for preprocessor computations.
51
+ - Int32 for tokenized inputs.
52
+
53
+ ---
54
+
55
+ ## Metadata
56
+
57
+ - **Keras Version**: `3.5.0`
58
+ - **Keras Hub Version**: `0.17.0`
59
+ - **Parameter Count**: `2,617,270,528` (2.6 billion parameters).
60
+ - **Date Saved**: `2024-11-18@13:59:51`
61
+
62
+ This metadata ensures reproducibility and provides insights into the complexity of the model.
63
+
64
+ ---
65
+
66
+ ## Applications
67
+
68
+ This model is designed for tasks requiring causal language modeling, including but not limited to:
69
+ - Text Generation.
70
+ - Dialogue Systems.
71
+ - Autoregressive NLP tasks.
72
+
73
+ ---
74
+
75
+ ## Model Files
76
+
77
+ - **Backbone Configuration**:
78
+ The core architecture details for `GemmaBackbone`.
79
+ - **Preprocessor Configuration**:
80
+ Tokenization and sequence preprocessing setup.
81
+ - **Tokenizer File**:
82
+ `tokenizer.json`.
83
+ - **Preprocessor File**:
84
+ `preprocessor.json`.
85
+
86
+ ---
87
+
88
+ ## Setup and Usage
89
+
90
+ 1. **Dependencies**:
91
+ Ensure the following libraries are installed:
92
+ ```bash
93
+ pip install keras keras_hub
94
+ ```
95
+
96
+ 2. **Model Loading**:
97
+ The model can be loaded as follows:
98
+ ```python
99
+ from keras_hub.src.models.gemma.gemma_causal_lm import GemmaCausalLM
100
+
101
+ model = GemmaCausalLM.from_config(config_file="path/to/config.json")
102
+ ```
103
+
104
+ 3. **Inference**:
105
+ Use the preprocessor to tokenize input text and generate predictions with the model.
106
+
107
+ ```python
108
+ preprocessor = model.get_preprocessor()
109
+ inputs = preprocessor.tokenize("Your input text here.")
110
+ outputs = model.predict(inputs)
111
+ print(outputs)
112
+ ```
113
+
114
+ ---
115
+
116
+ ## Contributions
117
+
118
+ Feel free to contribute to this repository by improving configurations, extending functionality, or reporting issues.
119
+
120
+ ---
121
+
122
+ ## License
123
+
124
+ This project is licensed under the MIT License. See the LICENSE file for details.