Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- EleutherAI/the_pile_deduplicated
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
---
|
8 |
+
|
9 |
+
Pythia-2.8B Deduped 4K is a [Pythia-2.8B Deduped](https://huggingface.co/EleutherAI/pythia-2.8b-deduped) model fine-tuned with a 4096 context length.
|
10 |
+
Training resumed from their 143,000 step checkpoint and continued on The Pile v1 Deduped (threshold=0.87).
|
11 |
+
This particular model is from a checkpoint captured at step 175,500 for an extra 134,217,728,000 tokens of training.
|
12 |
+
|
13 |
+
Note: Sequence length warmup was not used to move up from 2048 but, in hindsight, should have been applied.
|