serpdotai
/

sparsetral-16x7B-v2

Text Generation

Inference Endpoints

Model card Files Files and versions Community

francislabounty commited on Feb 5

Commit

1ca95a8

•

1 Parent(s): 66d34a3

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ language:
 - Effective batch size: 128
 - Learning Rate: 2e-5 with linear decay
 - Epochs: 1
-- Base model trained with QLoRA (rank 64, alpha 16) and MoE adapters/routers trained in bf16
 - Num Experts: 16
 - Top K: 4

 - Effective batch size: 128
 - Learning Rate: 2e-5 with linear decay
 - Epochs: 1
+- [Base model](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) trained with QLoRA (rank 64, alpha 16) and MoE adapters/routers trained in bf16
 - Num Experts: 16
 - Top K: 4