VHHBERT

VHHBERT is a RoBERTa-based model pre-trained on two million VHH sequences in VHHCorpus-2M. VHHBERT has the same model parameters as RoBERTa_BASE, except that it used positional embeddings with a length of 185 to cover the maximum sequence length of 179 in VHHCorpus-2M. Further details on VHHBERT are described in our paper "A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models.”

Usage

The model and tokenizer can be loaded using the transformers library.

from transformers import BertTokenizer, RobertaModel
tokenizer = BertTokenizer.from_pretrained("COGNANO/VHHBERT")
model = RobertaModel.from_pretrained("COGNANO/VHHBERT")

Citation

If you use VHHBERT in your research, please cite the following paper.

@inproceedings{tsuruta2024sars,
  title={A {SARS}-{C}o{V}-2 Interaction Dataset and {VHH} Sequence Corpus for Antibody Language Models},
  author={Hirofumi Tsuruta and Hiroyuki Yamazaki and Ryota Maeda and Ryotaro Tamura and Akihiro Imura},
  booktitle={Advances in Neural Information Processing Systems 37},
  year={2024}
}

Downloads last month: 102

Safetensors

Model size

85.8M params

Tensor type

F32

Dataset used to train COGNANO/VHHBERT

Paper for COGNANO/VHHBERT

A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models

Paper • 2405.18749 • Published May 29, 2024

COGNANO
/

VHHBERT