|
--- |
|
license: mit |
|
--- |
|
|
|
# TinyLlama-NoPE-1.1B |
|
|
|
NoPE is a transformer model without positional encoding. |
|
|
|
The model is trained following TinyLlama code base (https://github.com/jzhang38/TinyLlama) |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from transformers.models.llama import modeling_llama |
|
|
|
|
|
def nope_monkey_patch(q, k, cos, sin, position_ids, unsqueeze_dim=1): |
|
return q, k |
|
|
|
|
|
modeling_llama.apply_rotary_pos_emb = nope_monkey_patch |
|
|
|
model_path = "AntNLP/TinyLlama-NoPE-1.1B" |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
model = AutoModelForCausalLM.from_pretrained(model_path).cuda() |
|
|
|
input_ids = tokenizer("Hello, TinyLlama-NoPE", return_tensors="pt").input_ids.cuda() |
|
output = model.generate(input_ids, do_sample=True, max_length=50) |
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |
|
``` |
|
|
|
## Citation |
|
|
|
``` |
|
@misc{wang2024length, |
|
title={Length Generalization of Causal Transformers without Position Encoding}, |
|
author={Jie Wang and Tao Ji and Yuanbin Wu and Hang Yan and Tao Gui and Qi Zhang and Xuanjing Huang and Xiaoling Wang}, |
|
year={2024}, |
|
eprint={2404.12224}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |