bhenrym14
/

airoboros-l2-13b-2.1-YaRN-64k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Sep 2, 2023

Commit

b6de33f

•

1 Parent(s): 834b1fb

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ The finetune was performed with 1x RTX 6000 Ada.
 YaRN is not implemented natively in `Transformers`. The YaRN pretrained model [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k) contains a drop-in llama architecture replacement that interfaces with the included configuration file. **To maximize compatibility, I have included the version that omits flash attention.** To run using `Transformers`, you will therefore need to pass `trust_remote_code=True`.
-The PNTK method employed in my other model [bhenrym14/airophin-13b-pntk-16k-fp16](https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-fp16), is very similar to YaRN. For GPTQ, I have an exllama patch that I may adapt for YaRN, but the community appears motivated to rapidly implement YaRN natively, so I may not bother.
 Please comment with any questions and feedback on how this model performs, especially at long context lengths!

 YaRN is not implemented natively in `Transformers`. The YaRN pretrained model [NousResearch/Yarn-Llama-2-13b-64k](https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k) contains a drop-in llama architecture replacement that interfaces with the included configuration file. **To maximize compatibility, I have included the version that omits flash attention.** To run using `Transformers`, you will therefore need to pass `trust_remote_code=True`.
+The PNTK method employed in my other model [bhenrym14/airophin-13b-pntk-16k-fp16](https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-fp16), is very similar to YaRN. For GPTQ, I have an exllama patch that I may adapt for YaRN, but the community appears motivated to rapidly implement YaRN in common libraries, so I may not bother.
 Please comment with any questions and feedback on how this model performs, especially at long context lengths!