bhenrym14 commited on
Commit
8e02c8a
·
1 Parent(s): 2d8e465

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -8
README.md CHANGED
@@ -25,6 +25,8 @@ Please comment with any questions and feedback on how this model performs, espec
25
 
26
  Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 16384 to utilize the full context capabilities. Again `trust_remote_code=True` is imperative
27
 
 
 
28
  ## Motivation
29
 
30
  [Yet another RoPE extensioN method (YaRN)](https://github.com/jquesnelle/yarn) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. This model is an attempt to enable the community to assess the capabilities of this extension method in real world applications.
@@ -33,14 +35,18 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
33
 
34
  | Context (tokens) | **bhenrym14/airoboros-l2-13b-2.1-YaRN-64k** | bhenrym14/airoboros-l2-13b-PI-16k-fp16 | bhenrym14/airophin-v2-13b-PI-8k-fp16 | bhenrym14/airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
35
  | --- | --- |--- | ---| ----- | -----| ------| --- |
36
- | 512 | | 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
37
- | 1024 | | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85** |
38
- | 2048 | | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
39
- | 4096 | |4.94 | 4.90 | 5.08 | 5.50 | 4.91 | **4.77** |
40
- | 8192 | |**4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
41
- | 12000 | |**4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
42
-
43
-
 
 
 
 
44
  ## Prompting:
45
 
46
  Prompting differs with the airoboros 2.1 models. See [jondurbin/airoboros-l2-13b-2.1](https://huggingface.co/jondurbin/airoboros-l2-13b-2.1)
 
25
 
26
  Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 16384 to utilize the full context capabilities. Again `trust_remote_code=True` is imperative
27
 
28
+ **There may be issues on Windows systems loading this model due to the decimal in "2.1" found in the model name. Try simply changing the model directory name to omit this decimal if you have issues loading the model.**
29
+
30
  ## Motivation
31
 
32
  [Yet another RoPE extensioN method (YaRN)](https://github.com/jquesnelle/yarn) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. This model is an attempt to enable the community to assess the capabilities of this extension method in real world applications.
 
35
 
36
  | Context (tokens) | **bhenrym14/airoboros-l2-13b-2.1-YaRN-64k** | bhenrym14/airoboros-l2-13b-PI-16k-fp16 | bhenrym14/airophin-v2-13b-PI-8k-fp16 | bhenrym14/airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
37
  | --- | --- |--- | ---| ----- | -----| ------| --- |
38
+ | 512 | 7.64| 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
39
+ | 1024 | 6.15 | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85** |
40
+ | 2048 | 5.29 | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
41
+ | 4096 | 4.93 |4.94 | 4.90 | 5.08 | 5.50 | 4.91 | **4.77** |
42
+ | 8192 | **4.69** |4.71 | 4.71 | 4.90 | 5.32 | Not Tested | 57.1 |
43
+ | 12000 | -- | 4.54 | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
44
+
45
+ - Despite having a far higher scaling factor, this model is competitive with bhenrym14/airophin-v2-13b-PI-8k-fp16 at short context lengths.
46
+ - At longer context lengths, this model outperforms all others tested here (in terms of perplexity).
47
+ - Overall, it appears that YaRN is capable of extending the context window with minimal impact to short context performance, when compared to other methods. Furthermore, it's able to do this with a FAR higher scaling factor, which with other methods (especially PI), resulted in serious performance degradation at shorter context lengths.
48
+ - Both the YaRN and Code LLama papers suggest that YaRN and NTK scaling may ameliorate the issue of "U shaped" attention, where long context models struggle to attend to information in the middle of the context window. Further study is needed to evaluate this. Anecdotal feedback from the community on this issue would be appreciated!
49
+
50
  ## Prompting:
51
 
52
  Prompting differs with the airoboros 2.1 models. See [jondurbin/airoboros-l2-13b-2.1](https://huggingface.co/jondurbin/airoboros-l2-13b-2.1)