File size: 2,158 Bytes
fa95f31
9c5ce1d
 
 
 
 
fa95f31
 
9c5ce1d
fa95f31
9c5ce1d
fa95f31
9c5ce1d
fa95f31
9c5ce1d
 
 
 
 
 
fa95f31
9c5ce1d
fa95f31
9c5ce1d
 
fa95f31
9c5ce1d
fa95f31
9c5ce1d
fa95f31
9c5ce1d
 
 
 
 
 
fa95f31
9c5ce1d
 
 
 
fa95f31
9c5ce1d
fa95f31
9c5ce1d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
language:
- ms
- en
- zh
- ta
---

# Llama 3.2 3B Malaysian Reasoning

Continue finetuning https://huggingface.co/meta-llama/Llama-3.2-1B on highly curated 1.2B tokens Malaysian instruction including reasoning dataset.

## Improvement

1. 128k context length.
2. Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
3. Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
4. Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages.
5. Standard RAG.
6. Reasoning! Support minimal reasoning in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.

## MalayMMLU

```
```

## Training session

We done 2 stage of training,

1. Finetune on [Malaysian SFT](https://huggingface.co/datasets/mesolitica/Malaysian-SFT) to make the model understand Malaysian context.
- Wandb at https://wandb.ai/huseinzol05/lora-embedding-256-llama3.2-1b-small-malaysian-reasoning
2. Continue finetune on [Malaysian Reasoning](https://huggingface.co/datasets/mesolitica/Malaysian-Reasoning) including small samples of [Malaysian SFT](https://huggingface.co/datasets/mesolitica/Malaysian-SFT) to make it become reasoning model.
- Wandb at https://wandb.ai/huseinzol05/lora-embedding-256-llama3.2-1b-small-malaysian-reasoning-cont
  
## How we train

1. LoRA on `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"]`.
2. 256 Rank with alpha 512, or alpha of 2.0
3. Multipacking with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
4. Forked CCE loss for LoRA `lm_head` to reduce memory consumption.

Low Rank adapters pushed at [malayloraenjoyer/Llama-3.2-1B-Malaysian-Reasoning-LoRA](https://huggingface.co/malayloraenjoyer/Llama-3.2-1B-Malaysian-Reasoning-LoRA).

Source code at https://github.com/mesolitica/malaya/tree/master/session/small-malaysian-reasoning