caskcsg
/

Llama-3-8B-NExtLong-512K-Instruct

Model card Files Files and versions Community

root commited on Apr 2

Commit

bcdc052

·

1 Parent(s): fe798c4

uodapte short benchmark

Files changed (1) hide show

README.md +15 -1

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ Large language models (LLMs) with extended context windows have made significant
 ## NExtLong Models
 Our released models are listed as follows. You can import these models by using [HuggingFace's Transformers](https://github.com/huggingface/transformers). All models are trained on long-context data synthesized by [fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) and [Cosmopedia v2](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus).
 |              Model              |  Avg.  | Recall |  RAG  |  ICL  | Re-rank | LongQA |  RULER  |
@@ -46,6 +46,20 @@ We released our Instruct model, which is based on our Llama-3-8B-NExtLong-512K-B
 In addition, fine-tuning using the [ultrachat](https://huggingface.co/datasets/stingning/ultrachat) dataset can also yield good results, as we reported in Section 5.2 of the [NExtLong paper](https://arxiv.org/pdf/2501.12766).
 <a id="NExtLong-datasets"></a>
 ## NExtLong Datasets

 ## NExtLong Models
+### Long-context Benchmarks
 Our released models are listed as follows. You can import these models by using [HuggingFace's Transformers](https://github.com/huggingface/transformers). All models are trained on long-context data synthesized by [fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) and [Cosmopedia v2](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus).
 |              Model              |  Avg.  | Recall |  RAG  |  ICL  | Re-rank | LongQA |  RULER  |
 In addition, fine-tuning using the [ultrachat](https://huggingface.co/datasets/stingning/ultrachat) dataset can also yield good results, as we reported in Section 5.2 of the [NExtLong paper](https://arxiv.org/pdf/2501.12766).
+### Short-context Benchmarks
+| Model                      | AVG   | HellaSwag | Lambada_OpenAI | ARC-Challenge | ARC-Easy | PIQA  | WinoGrande | Logiqa | MMLU  |
+|----------------------------|-------|-----------|----------------|---------------|----------|-------|------------|--------|-------|
+| **Meta-Llama-3-8B-Instruct**   | 0.6332 | 0.5773    | 0.7171         | 0.5316        | 0.8165   | 0.7889 | 0.7198     | 0.2765 | 0.6376 |
+| **NextLong-Llama-3-8B-Instruct** | 0.6410 | 0.5953    | 0.7242         | 0.5188        | 0.8224   | 0.8079 | 0.7324     | 0.3041 | 0.6232 |
+Comparing with Meta-Llama-3-8B-Instruct, NextLong-Llama-3-8B-Instruct shows no degradation on the short-context benchmarks.
 <a id="NExtLong-datasets"></a>
 ## NExtLong Datasets