BacteriaCDS-DNABERT-K6-89M
This model, BacteriaCDS-DNABERT-K6-89M
, is a DNA sequence classifier based on DNABERT trained for coding sequence (CDS) classification in bacterial genomes. It operates on 6-mer tokenized sequences and was fine-tuned using 89M trainable parameters.
Model Details
- Base Model: DNABERT
- Task: Bacterial CDS Classification
- K-mer Size: 6
- Input Sequence: Open Reading Frame(Last 510 nucleotides from end of the sequence)
- Number of Trainable Parameters: 89M
- Max Sequence Length: 512
- Precision Used: AMP (Automatic Mixed Precision)
Install Dependencies
Ensure you have transformers
and torch
installed:
pip install torch transformers
Load Model & Tokenizer
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Load Model
model_checkpoint = "Genereux-akotenou/BacteriaCDS-DNABERT-K6-89M"
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
Inference Example
This model works with 6-mer tokenized sequences. You need to convert raw DNA sequences into k-mer format:
def generate_kmer(sequence: str, k: int, overlap: int = 1):
return " ".join([sequence[j:j+k] for j in range(0, len(sequence) - k + 1, overlap)])
sequence = "ATGAGAACCAGCCGGAGACCTCCTGCTCGTACATGAAAGGCTCGAGCAGCCGGGCGAGGGCGGTAG"
seq_kmer = generate_kmer(sequence, k=6, overlap=3)
# Run inference
inputs = tokenizer(
seq_kmer,
return_tensors="pt",
max_length=tokenizer.model_max_length,
padding="max_length",
truncation=True
)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class = torch.argmax(logits, dim=-1).item()
- Downloads last month
- 1,166
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support model that require custom code execution.