Safetensors
root commited on
Commit
bcdc052
·
1 Parent(s): fe798c4

uodapte short benchmark

Browse files
Files changed (1) hide show
  1. README.md +15 -1
README.md CHANGED
@@ -28,7 +28,7 @@ Large language models (LLMs) with extended context windows have made significant
28
 
29
  ## NExtLong Models
30
 
31
-
32
  Our released models are listed as follows. You can import these models by using [HuggingFace's Transformers](https://github.com/huggingface/transformers). All models are trained on long-context data synthesized by [fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) and [Cosmopedia v2](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus).
33
 
34
  | Model | Avg. | Recall | RAG | ICL | Re-rank | LongQA | RULER |
@@ -46,6 +46,20 @@ We released our Instruct model, which is based on our Llama-3-8B-NExtLong-512K-B
46
 
47
  In addition, fine-tuning using the [ultrachat](https://huggingface.co/datasets/stingning/ultrachat) dataset can also yield good results, as we reported in Section 5.2 of the [NExtLong paper](https://arxiv.org/pdf/2501.12766).
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  <a id="NExtLong-datasets"></a>
50
 
51
  ## NExtLong Datasets
 
28
 
29
  ## NExtLong Models
30
 
31
+ ### Long-context Benchmarks
32
  Our released models are listed as follows. You can import these models by using [HuggingFace's Transformers](https://github.com/huggingface/transformers). All models are trained on long-context data synthesized by [fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) and [Cosmopedia v2](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus).
33
 
34
  | Model | Avg. | Recall | RAG | ICL | Re-rank | LongQA | RULER |
 
46
 
47
  In addition, fine-tuning using the [ultrachat](https://huggingface.co/datasets/stingning/ultrachat) dataset can also yield good results, as we reported in Section 5.2 of the [NExtLong paper](https://arxiv.org/pdf/2501.12766).
48
 
49
+
50
+ ### Short-context Benchmarks
51
+
52
+
53
+
54
+ | Model | AVG | HellaSwag | Lambada_OpenAI | ARC-Challenge | ARC-Easy | PIQA | WinoGrande | Logiqa | MMLU |
55
+ |----------------------------|-------|-----------|----------------|---------------|----------|-------|------------|--------|-------|
56
+ | **Meta-Llama-3-8B-Instruct** | 0.6332 | 0.5773 | 0.7171 | 0.5316 | 0.8165 | 0.7889 | 0.7198 | 0.2765 | 0.6376 |
57
+ | **NextLong-Llama-3-8B-Instruct** | 0.6410 | 0.5953 | 0.7242 | 0.5188 | 0.8224 | 0.8079 | 0.7324 | 0.3041 | 0.6232 |
58
+
59
+ Comparing with Meta-Llama-3-8B-Instruct, NextLong-Llama-3-8B-Instruct shows no degradation on the short-context benchmarks.
60
+
61
+
62
+
63
  <a id="NExtLong-datasets"></a>
64
 
65
  ## NExtLong Datasets