--- license: mit datasets: - mshojaei77/Persian_sft language: - fa base_model: google/gemma-2-2b-it tags: - conversational - text-generation - persian - gemma - qlora - fine-tuned - experimental library_name: transformers pipeline_tag: text-generation new_version: mshojaei77/gemma-2-2b-fa-v2 --- # Persian Gemma 2b - Conversational AI Experiment (Early Stage) This repository presents **Persian Gemma 2b**, an **early-stage experimental model** derived from Google's Gemma-2-2b-it. It has been fine-tuned using QLoRA on the `mshojaei77/Persian_sft` dataset to explore its capabilities in **Persian language conversational tasks.** ![Persian Gemma 2b](https://cdn-uploads.huggingface.co/production/uploads/6556b1bb85d43542fa1a8f91/_DF_F2oXNKXkFQixDQlF5.png) ## 1. Model Architecture * **Base Model:** [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) * **Architecture Type:** Gemma2ForCausalLM * **Model Size:** 2 billion parameters. * **Description:** Persian Gemma 2b inherits the architecture of Gemma-2-2b-it, a lightweight yet capable model known for its efficiency and strong performance for its size. It is designed for text generation tasks and is particularly suited for conversational applications. The model uses standard transformer layers with attention mechanisms, enabling it to process and generate text in Persian. --- ## 2. Training Details * **Fine-tuning Method:** QLoRA (Quantization-aware Low-Rank Adaptation) * QLoRA is used for parameter-efficient fine-tuning, allowing adaptation of the base model with reduced computational resources and memory footprint. * LoRA Rank (r): 32 * LoRA Alpha: 16 * LoRA Dropout: 0.05 * LoRA Target Modules: `['down_proj', 'gate_proj', 'k_proj', 'o_proj', 'q_proj', 'up_proj', 'v_proj']` (linear layers) * **Training Dataset:** [mshojaei77/Persian_sft](https://huggingface.co/datasets/mshojaei77/Persian_sft) * **Training Steps:** 20 (Extremely limited - **Proof of Concept**) * **Hardware:** Kaggle Notebook, T4 GPU * **Software:** Axolotl library * **Optimizer:** paged_adamw_32bit * **Learning Rate Scheduler:** cosine * **Learning Rate:** 0.0002 * **Micro Batch Size:** 1 * **Gradient Accumulation Steps:** 1 * **Sequence Length:** 2048 * **Sample Packing:** Enabled (`sample_packing: true`) * **Mixed Precision:** FP16 (`fp16: true`), Load in 4bit (`load_in_4bit: true`), BF16: Disabled (`bf16: false`) * **Gradient Checkpointing:** Enabled (`gradient_checkpointing: true`) * **Attention Implementation:** SDPA (default, Flash Attention: explicitly disabled - `flash_attention: false`) * **Tokenizer:** Uses the tokenizer from the base model `google/gemma-2-2b-it`. * **Chat Template:** gemma * **Training Objective:** Supervised Fine-tuning (SFT) to adapt the base model for Persian conversational responses, guided by the `Persian_sft` dataset. * **Validation Set:** **None used** in this preliminary experiment. > **Critical Note:** The model was trained for an exceptionally short duration (20 steps). This is insufficient for robust learning and generalization. Expect significantly under-optimized performance. --- ## 3. Dataset Information * **Dataset Name:** [mshojaei77/Persian_sft](https://huggingface.co/datasets/mshojaei77/Persian_sft) * **Dataset Description:** The `Persian_sft` dataset is a collection of Persian conversations designed for instruction fine-tuning of language models. It likely contains examples of user queries and desired model responses in Persian, formatted for conversational fine-tuning. * **Dataset Type:** Supervised Fine-tuning (SFT) dataset for conversational AI. * **Language:** Primarily Persian (fa). * --- ## 4. Intended Use **Intended Use Cases:** * **Research & Experimentation:** Primary use is to investigate the feasibility of fine-tuning Gemma-2-2b-it for Persian language conversational tasks and to serve as a starting point for further research. * **Educational Purposes:** Demonstration of QLoRA fine-tuning techniques using Axolotl, and a practical example for learning about Persian language model development. * **Community Development:** To encourage community contributions towards building better Persian language models and resources. * **Prototyping (with caution):** For rapid prototyping and exploring potential applications of Persian conversational AI, strictly acknowledging the model's limitations and preliminary state. --- ## 5. Limitations * **Severe Under-training:** Trained for only 20 steps, leading to significantly sub-optimal performance across all aspects. * **Lack of Validation:** Absence of a validation set hinders monitoring of generalization and increases the risk of overfitting. * **Limited Fluency and Coherence:** May produce grammatically incorrect, disfluent, or incoherent Persian text, especially in complex or lengthy conversations. * **Hallucinations and Factual Errors:** Prone to generating factually incorrect or nonsensical information. Verification of output is crucial. * **Bias:** Likely inherits and potentially amplifies biases from the base model and the fine-tuning dataset, leading to biased or unfair outputs. * **Poor Generalization:** Performance is expected to degrade significantly on data outside the training distribution (different conversational styles, topics, or domains). * **Limited Conversational Abilities:** May struggle with complex conversational turns, context maintenance, and nuanced understanding of user intent. * **Ethical Concerns:** Potential for biased, inaccurate, or inappropriate output raises ethical concerns, especially in sensitive applications. --- ## 6. Performance Metrics **Current Evaluation:** * **No formal evaluation has been conducted for this preliminary model due to its extremely limited training.** Performance is expected to be significantly below optimal. --- ## 7. How to Use ```python import torch from transformers import pipeline # Initialize the text generation pipeline pipe = pipeline( "text-generation", model="mshojaei77/Gemma-2b-fa", model_kwargs={"torch_dtype": torch.bfloat16}, device="cuda", # Or "mps" for Macs with Apple Silicon ) # Prepare input messages (using the gemma chat template implicitly) messages = [ {"role": "user", "content": "سلام چطوری؟"}, ] # Generate a response with a maximum of 512 new tokens outputs = pipe(messages, max_new_tokens=512, chat_template="gemma") # Explicitly using chat_template for clarity assistant_response = outputs[0]["generated_text"][-1]["content"].strip() print(assistant_response) # Example Output (Illustrative - Output quality may vary significantly): # سلام! من خوبم، ممنون. شما چطوری؟ 😊 ``` **Important Usage Notes:** * **`library_name: transformers` and `pipeline_tag: text-generation`**: Specified in metadata and Model Details for discoverability and clarity. * **`chat_template="gemma"`:** Use the correct chat template for Gemma models. * **Hardware Recommendations:** CUDA GPU recommended. `device="mps"` for Apple Silicon (performance may vary). * **Output Quality:** Expect highly variable and often suboptimal output due to limited training. Critical evaluation of generated text is essential.