root
commited on
Commit
·
bcdc052
1
Parent(s):
fe798c4
uodapte short benchmark
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ Large language models (LLMs) with extended context windows have made significant
|
|
28 |
|
29 |
## NExtLong Models
|
30 |
|
31 |
-
|
32 |
Our released models are listed as follows. You can import these models by using [HuggingFace's Transformers](https://github.com/huggingface/transformers). All models are trained on long-context data synthesized by [fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) and [Cosmopedia v2](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus).
|
33 |
|
34 |
| Model | Avg. | Recall | RAG | ICL | Re-rank | LongQA | RULER |
|
@@ -46,6 +46,20 @@ We released our Instruct model, which is based on our Llama-3-8B-NExtLong-512K-B
|
|
46 |
|
47 |
In addition, fine-tuning using the [ultrachat](https://huggingface.co/datasets/stingning/ultrachat) dataset can also yield good results, as we reported in Section 5.2 of the [NExtLong paper](https://arxiv.org/pdf/2501.12766).
|
48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
<a id="NExtLong-datasets"></a>
|
50 |
|
51 |
## NExtLong Datasets
|
|
|
28 |
|
29 |
## NExtLong Models
|
30 |
|
31 |
+
### Long-context Benchmarks
|
32 |
Our released models are listed as follows. You can import these models by using [HuggingFace's Transformers](https://github.com/huggingface/transformers). All models are trained on long-context data synthesized by [fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) and [Cosmopedia v2](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus).
|
33 |
|
34 |
| Model | Avg. | Recall | RAG | ICL | Re-rank | LongQA | RULER |
|
|
|
46 |
|
47 |
In addition, fine-tuning using the [ultrachat](https://huggingface.co/datasets/stingning/ultrachat) dataset can also yield good results, as we reported in Section 5.2 of the [NExtLong paper](https://arxiv.org/pdf/2501.12766).
|
48 |
|
49 |
+
|
50 |
+
### Short-context Benchmarks
|
51 |
+
|
52 |
+
|
53 |
+
|
54 |
+
| Model | AVG | HellaSwag | Lambada_OpenAI | ARC-Challenge | ARC-Easy | PIQA | WinoGrande | Logiqa | MMLU |
|
55 |
+
|----------------------------|-------|-----------|----------------|---------------|----------|-------|------------|--------|-------|
|
56 |
+
| **Meta-Llama-3-8B-Instruct** | 0.6332 | 0.5773 | 0.7171 | 0.5316 | 0.8165 | 0.7889 | 0.7198 | 0.2765 | 0.6376 |
|
57 |
+
| **NextLong-Llama-3-8B-Instruct** | 0.6410 | 0.5953 | 0.7242 | 0.5188 | 0.8224 | 0.8079 | 0.7324 | 0.3041 | 0.6232 |
|
58 |
+
|
59 |
+
Comparing with Meta-Llama-3-8B-Instruct, NextLong-Llama-3-8B-Instruct shows no degradation on the short-context benchmarks.
|
60 |
+
|
61 |
+
|
62 |
+
|
63 |
<a id="NExtLong-datasets"></a>
|
64 |
|
65 |
## NExtLong Datasets
|