AptaGPT

AptaGPT is a generative pre-trained language model for aptamer design. The model focuses on the generation of a new sequence space of aptamers, trained and fine-tuned using the third and sixth round of SELEX data for B cell maturation antigen (BCMA).

Dataset

AptaGPT was pre-trained using a large dataset consisting of 108,229,900 sequences from the third round of the SELEX process targeting BCMA. This extensive dataset provided a robust foundation for learning generalized patterns in aptamer sequences. For fine-tuning, the model utilized 9,350 sequences from the sixth round of SELEX. All aptamer sequences used for both pre-training and fine-tuning are 35 nucleotides in length.

Requirements

Before running the AptaGPT model, the following Python dependencies need to be installed:

pip install transformers sentencepiece

Usage Examples

To load the model form hugging face:

from transformers import pipeline
aptagpt = pipeline('text-generation', model="tmbj-aidd/aptagpt-bcma")

To generate aptamer sequences:

sequences = aptagpt("<|endoftext|>",
                max_length=15,
                do_sample=True,
                top_k=700,
                repetition_penalty=1.2,
                num_return_sequences=10,
                )
print(sequences)
Downloads last month
30
Safetensors
Model size
86.6M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.