point to svg for long msl image
Browse files
README.md
CHANGED
@@ -118,7 +118,7 @@ Figure 4: Performance at 7B model size
|
|
118 |
## Long Sequence Lengths
|
119 |
To enable long sequence applications, we use ALiBi position embeddings and trained on 470B tokens at the context length of 2,048 followed by 157B of tokens trained at 8,192 context length. To assess BTLM’s long sequence capability, we evaluate it on SlimPajama test set with 32,768 context length and plot loss at each token position. Although ALiBi allows extrapolation in theory, 2,048 context length training alone does not extrapolate well in practice. Thankfully variable sequence length training allows for substantially improved extrapolation. BTLM-3B extrapolates well up to 10k context length but the performance degrades slightly beyond this.
|
120 |
|
121 |
-
![figure_5_image](./figure_5_xentropy_with_sequence_lengths.
|
122 |
Figure 5: BTLM-3B model's cross-entropy evaluation on the SlimPajama’s test set. Inference performed on the extrapolated sequence length of 32,768 tokens.
|
123 |
|
124 |
## Model Details
|
|
|
118 |
## Long Sequence Lengths
|
119 |
To enable long sequence applications, we use ALiBi position embeddings and trained on 470B tokens at the context length of 2,048 followed by 157B of tokens trained at 8,192 context length. To assess BTLM’s long sequence capability, we evaluate it on SlimPajama test set with 32,768 context length and plot loss at each token position. Although ALiBi allows extrapolation in theory, 2,048 context length training alone does not extrapolate well in practice. Thankfully variable sequence length training allows for substantially improved extrapolation. BTLM-3B extrapolates well up to 10k context length but the performance degrades slightly beyond this.
|
120 |
|
121 |
+
![figure_5_image](./figure_5_xentropy_with_sequence_lengths.svg)
|
122 |
Figure 5: BTLM-3B model's cross-entropy evaluation on the SlimPajama’s test set. Inference performed on the extrapolated sequence length of 32,768 tokens.
|
123 |
|
124 |
## Model Details
|