bhenrym14 commited on
Commit
b6de33f
1 Parent(s): 834b1fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -19,7 +19,7 @@ The finetune was performed with 1x RTX 6000 Ada.
19
 
20
  YaRN is not implemented natively in `Transformers`. The YaRN pretrained model [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k) contains a drop-in llama architecture replacement that interfaces with the included configuration file. **To maximize compatibility, I have included the version that omits flash attention.** To run using `Transformers`, you will therefore need to pass `trust_remote_code=True`.
21
 
22
- The PNTK method employed in my other model [bhenrym14/airophin-13b-pntk-16k-fp16](https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-fp16), is very similar to YaRN. For GPTQ, I have an exllama patch that I may adapt for YaRN, but the community appears motivated to rapidly implement YaRN natively, so I may not bother.
23
 
24
  Please comment with any questions and feedback on how this model performs, especially at long context lengths!
25
 
 
19
 
20
  YaRN is not implemented natively in `Transformers`. The YaRN pretrained model [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k) contains a drop-in llama architecture replacement that interfaces with the included configuration file. **To maximize compatibility, I have included the version that omits flash attention.** To run using `Transformers`, you will therefore need to pass `trust_remote_code=True`.
21
 
22
+ The PNTK method employed in my other model [bhenrym14/airophin-13b-pntk-16k-fp16](https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-fp16), is very similar to YaRN. For GPTQ, I have an exllama patch that I may adapt for YaRN, but the community appears motivated to rapidly implement YaRN in common libraries, so I may not bother.
23
 
24
  Please comment with any questions and feedback on how this model performs, especially at long context lengths!
25