Update README.md
Browse files
README.md
CHANGED
@@ -103,8 +103,8 @@ After completing the globally distributed pretraining phase, we applied several
|
|
103 |
First, we conducted an extensive series of 16 Supervised Fine-Tuning (SFT) trainings, with individual runs ranging from 1 to 3.3 billion tokens each. The most successful configuration used 2.4 billion training tokens over 3 epochs. We used MergeKit, EvolKit, and DistillKit from Arcee AI to combine the models, generate the data sets, and distill the logits, respectively. For training data, we used a diverse set of high-quality datasets:
|
104 |
|
105 |
1. **New Datasets** (released with INTELLECT-1):
|
106 |
-
- arcee-ai/EvolKit-75k (generated via EvolKit)
|
107 |
-
- arcee-ai/Llama-405B-Logits
|
108 |
- arcee-ai/The-Tomb
|
109 |
|
110 |
2. **Instruction Following**:
|
|
|
103 |
First, we conducted an extensive series of 16 Supervised Fine-Tuning (SFT) trainings, with individual runs ranging from 1 to 3.3 billion tokens each. The most successful configuration used 2.4 billion training tokens over 3 epochs. We used MergeKit, EvolKit, and DistillKit from Arcee AI to combine the models, generate the data sets, and distill the logits, respectively. For training data, we used a diverse set of high-quality datasets:
|
104 |
|
105 |
1. **New Datasets** (released with INTELLECT-1):
|
106 |
+
- [arcee-ai/EvolKit-75k (generated via EvolKit)](https://huggingface.co/datasets/arcee-ai/EvolKit-75K)
|
107 |
+
- [arcee-ai/Llama-405B-Logits](https://huggingface.co/datasets/arcee-ai/LLama-405B-Logits)
|
108 |
- arcee-ai/The-Tomb
|
109 |
|
110 |
2. **Instruction Following**:
|