You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

DeBERTa (1.5B) fixed version

This is deberta-v2-xxlarge updated to implement the AutoModelForCausalLM class, enabling it to generate text. This implementation is based on our paper "BERTs are Generative In-Context Learners".

This repository also fixes three bugs in the original HF implementation of DeBERTa:

  1. We fixed the incorrect name of the output embedding weights in the checkpoint file;
  2. We fixed the implementation of the enhanced mask decoder (EMD), based on the original GitHub repository;
  3. We clamp the positional embeddings so that they work with long sequence lengths.

Example code

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ltg/deberta-xxlarge-fixed", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("ltg/deberta-xxlarge-fixed", trust_remote_code=True).cuda().eval()

prompt = """German: Hallo, wie geht es Ihnen heute?
English:"""
prompt = prompt.replace('\n', '\\n ')
input_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.cuda()

prediction = model.generate(
    input_ids,
    num_beams=4,
    do_sample=False,
    use_cache=None,
    max_new_tokens=64,
    eos_token_id=tokenizer(".\\", add_special_tokens=False).input_ids[1:]
)
prediction = prediction[0, input_ids.size(1):]
prediction = tokenizer.decode(prediction).rstrip('\\')

# Expected output: "Hello, how are you doing today?"
print(prediction)

Citation

If you find DeBERTa useful for your work, please cite the following paper:

@misc{samuel2024berts,
  title={{BERTs} are Generative In-Context Learners}, 
  author={David Samuel},
  year={2024},
  eprint={2406.04823},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2406.04823}
}
@inproceedings{he2021deberta,
  title={{DeBERTa}: Decoding-enhanced {BERT} with disentangled attention},
  author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=XPZIaotutsD}
}
Downloads last month
73
Inference Examples
Unable to determine this model's library. Check the docs .