|
--- |
|
license: apache-2.0 |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- biology |
|
- text-generation-inference |
|
- aptamer |
|
--- |
|
## AptaGPT |
|
AptaGPT is a generative pre-trained language model for aptamer design. The model focuses on the generation of a new sequence space of aptamers, trained and fine-tuned using the third and sixth round of SELEX data for B cell maturation antigen (BCMA). |
|
## Dataset |
|
AptaGPT was pre-trained using a large dataset consisting of 108,229,900 sequences from the third round of the SELEX process targeting BCMA. This extensive dataset provided a robust foundation for learning generalized patterns in aptamer sequences. For fine-tuning, the model utilized 9,350 sequences from the sixth round of SELEX. All aptamer sequences used for both pre-training and fine-tuning are 35 nucleotides in length. |
|
## Requirements |
|
Before running the AptaGPT model, the following Python dependencies need to be installed: |
|
```bash |
|
pip install transformers sentencepiece |
|
``` |
|
## Usage Examples |
|
To load the model form hugging face: |
|
```python |
|
from transformers import pipeline |
|
aptagpt = pipeline('text-generation', model="tmbj-aidd/aptagpt-bcma") |
|
``` |
|
To generate aptamer sequences: |
|
```python |
|
sequences = aptagpt("<|endoftext|>", |
|
max_length=15, |
|
do_sample=True, |
|
top_k=700, |
|
repetition_penalty=1.2, |
|
num_return_sequences=10, |
|
) |
|
print(sequences) |
|
``` |