bhenrym14
/

airoboros-l2-13b-2.1-YaRN-64k

@@ -25,6 +25,8 @@ Please comment with any questions and feedback on how this model performs, espec
 Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 16384 to utilize the full context capabilities. Again `trust_remote_code=True` is imperative
 ## Motivation
 [Yet another RoPE extensioN method (YaRN)](https://github.com/jquesnelle/yarn) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. This model is an attempt to enable the community to assess the capabilities of this extension method in real world applications.
@@ -33,14 +35,18 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
 | Context (tokens) | **bhenrym14/airoboros-l2-13b-2.1-YaRN-64k** | bhenrym14/airoboros-l2-13b-PI-16k-fp16 | bhenrym14/airophin-v2-13b-PI-8k-fp16 | bhenrym14/airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16  | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
 | --- | --- |--- | ---| ----- | -----| ------| --- |
-| 512 | | 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
-| 1024 | | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85**  |
-| 2048 | | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
-| 4096 | |4.94 | 4.90 | 5.08 | 5.50 | 4.91 | **4.77** |
-| 8192 | |**4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
-| 12000 | |**4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
 ## Prompting:
 Prompting differs with the airoboros 2.1 models. See [jondurbin/airoboros-l2-13b-2.1](https://huggingface.co/jondurbin/airoboros-l2-13b-2.1)

 Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 16384 to utilize the full context capabilities. Again `trust_remote_code=True` is imperative
+**There may be issues on Windows systems loading this model due to the decimal in "2.1" found in the model name. Try simply changing the model directory name to omit this decimal if you have issues loading the model.**
 ## Motivation
 [Yet another RoPE extensioN method (YaRN)](https://github.com/jquesnelle/yarn) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. This model is an attempt to enable the community to assess the capabilities of this extension method in real world applications.
 | Context (tokens) | **bhenrym14/airoboros-l2-13b-2.1-YaRN-64k** | bhenrym14/airoboros-l2-13b-PI-16k-fp16 | bhenrym14/airophin-v2-13b-PI-8k-fp16 | bhenrym14/airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16  | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
 | --- | --- |--- | ---| ----- | -----| ------| --- |
+| 512 | 7.64| 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
+| 1024 | 6.15 | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85**  |
+| 2048 | 5.29 | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
+| 4096 | 4.93 |4.94 | 4.90 | 5.08 | 5.50 | 4.91 | **4.77** |
+| 8192 | **4.69** |4.71 | 4.71 | 4.90 | 5.32 | Not Tested | 57.1 |
+| 12000 | -- | 4.54 | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
+- Despite having a far higher scaling factor, this model is competitive with bhenrym14/airophin-v2-13b-PI-8k-fp16 at short context lengths.
+- At longer context lengths, this model outperforms all others tested here (in terms of perplexity).
+- Overall, it appears that YaRN is capable of extending the context window with minimal impact to short context performance, when compared to other methods. Furthermore, it's able to do this with a FAR higher scaling factor, which with other methods (especially PI), resulted in serious performance degradation at shorter context lengths.
+- Both the YaRN and Code LLama papers suggest that YaRN and NTK scaling may ameliorate the issue of "U shaped" attention, where long context models struggle to attend to information in the middle of the context window. Further study is needed to evaluate this. Anecdotal feedback from the community on this issue would be appreciated!
 ## Prompting:
 Prompting differs with the airoboros 2.1 models. See [jondurbin/airoboros-l2-13b-2.1](https://huggingface.co/jondurbin/airoboros-l2-13b-2.1)