francislabounty
commited on
Commit
•
1ca95a8
1
Parent(s):
66d34a3
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ language:
|
|
12 |
- Effective batch size: 128
|
13 |
- Learning Rate: 2e-5 with linear decay
|
14 |
- Epochs: 1
|
15 |
-
- Base model trained with QLoRA (rank 64, alpha 16) and MoE adapters/routers trained in bf16
|
16 |
- Num Experts: 16
|
17 |
- Top K: 4
|
18 |
|
|
|
12 |
- Effective batch size: 128
|
13 |
- Learning Rate: 2e-5 with linear decay
|
14 |
- Epochs: 1
|
15 |
+
- [Base model](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) trained with QLoRA (rank 64, alpha 16) and MoE adapters/routers trained in bf16
|
16 |
- Num Experts: 16
|
17 |
- Top K: 4
|
18 |
|