File size: 2,155 Bytes
445f873 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
language:
- en
- lug
tags:
- llama-3.1
- gemma-2b
- finetuned
- english-luganda
- translation
- peft
- qlora
---
# final_model_8b_16
This model is finetuned for English-Luganda bidirectional translation tasks. It's trained using QLoRA (Quantized Low-Rank Adaptation) on the original LLaMA-3.1-8B model.
## Model Details
### Base Model Information
- Base model: unsloth/Meta-Llama-3.1-8B
- Model family: LLaMA-3.1-8B
- Type: Base
- Original model size: 8B parameters
### Training Configuration
- Training method: QLoRA (4-bit quantization)
- LoRA rank (r): 16
- LoRA alpha: 16
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- LoRA dropout: 0
- Learning rate: 2e-5
- Batch size: 2
- Gradient accumulation steps: 4
- Max sequence length: 2048
- Weight decay: 0.01
- Training steps: 100,000
- Warmup steps: 1000
- Save interval: 10,000 steps
- Optimizer: AdamW (8-bit)
- LR scheduler: Cosine
- Mixed precision: bf16
- Gradient checkpointing: Enabled (unsloth)
### Dataset Information
- Training data: Parallel English-Luganda corpus
- Data sources:
- SALT dataset (salt-train-v1.4)
- Extracted parallel sentences
- Synthetic code-mixed data
- Bidirectional translation: Trained on both English→Luganda and Luganda→English
- Total training examples: Varies by direction
### Usage
This model uses an instruction-based prompt format:
```
Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.
### Instruction:
Translate the following text to [target_lang]
### Input:
[input text]
### Response:
[translation]
```
## Training Infrastructure
- Trained using unsloth optimization library
- Hardware: Single A100 GPU
- Quantization: 4-bit training enabled
## Limitations
- The model is specialized for English-Luganda translation
- Performance may vary based on domain and complexity of text
- Limited to the context length of 16 tokens
## Citation and Contact
If you use this model, please cite:
- Original LLaMA-3.1 model by Meta AI
- QLoRA paper: Dettmers et al. (2023)
- unsloth optimization library
|