File size: 4,041 Bytes
905b4a8
 
105aad5
5511f5c
105aad5
5511f5c
105aad5
8b05966
105aad5
5511f5c
905b4a8
 
3c03b76
905b4a8
5511f5c
905b4a8
 
0f50776
3c03b76
0f50776
 
 
 
 
9c0a12c
3c03b76
 
0f50776
 
5511f5c
 
 
 
 
 
 
 
 
 
905b4a8
 
5511f5c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e42db38
3c03b76
5511f5c
 
 
 
 
 
 
905b4a8
105aad5
5511f5c
 
 
 
905b4a8
5511f5c
 
 
 
 
 
 
 
 
 
 
 
 
 
105aad5
905b4a8
7b329f5
5511f5c
 
 
 
 
 
 
 
 
 
 
 
 
 
e937613
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
library_name: transformers
tags:
- CodonTransformer
- Computational Biology
- Machine Learning
- Bioinformatics
- Synthetic Biology
license: apache-2.0
pipeline_tag: token-classification
---

![image/png](https://github.com/Adibvafa/CodonTransformer/raw/main/src/banner_final.png)

**CodonTransformer** is the ultimate tool for codon optimization, transforming protein sequences into optimized DNA sequences specific for your target organisms. Whether you are a researcher or a practitioner in genetic engineering, CodonTransformer provides a comprehensive suite of features to facilitate your work. By leveraging the Transformer architecture and a user-friendly Jupyter notebook, it reduces the complexity of codon optimization, saving you time and effort.


## Authors
Adibvafa Fallahpour<sup>1,2</sup>\*, Vincent Gureghian<sup>3</sup>\*, Guillaume J. Filion<sup>2</sup>‡,  Ariel B. Lindner<sup>3</sup>‡,  Amir Pandi<sup>3</sup><sup>1</sup> Vector Institute for Artificial Intelligence, Toronto ON, Canada  
<sup>2</sup> University of Toronto Scarborough; Department of Biological Science; Scarborough ON, Canada  
<sup>3</sup> Université Paris Cité, INSERM U1284, Center for Research and Interdisciplinarity, F-75006 Paris, France  
\* These authors contributed equally to this work.  
‡ To whom correspondence should be addressed: <br>
[email protected], [email protected], [email protected]
<br>


## Use Case
**For an interactive demo, check out our [Google Colab Notebook.](https://adibvafa.github.io/CodonTransformer/GoogleColab)**
<br></br>
After installing CodonTransformer, you can use:
```python
import torch
from transformers import AutoTokenizer, BigBirdForMaskedLM
from CodonTransformer.CodonPrediction import predict_dna_sequence
from CodonTransformer.CodonJupyter import format_model_output
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")


# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer")
model = BigBirdForMaskedLM.from_pretrained("adibvafa/CodonTransformer").to(DEVICE)


# Set your input data
protein = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG"
organism = "Escherichia coli general"


# Predict with CodonTransformer
output = predict_dna_sequence(
    protein=protein,
    organism=organism,
    device=DEVICE,
    tokenizer=tokenizer,
    model=model,
    attention_type="original_full",
)
print(format_model_output(output))
```
The output is:
<br>


```python
-----------------------------
|          Organism         |
-----------------------------
Escherichia coli general

-----------------------------
|       Input Protein       |
-----------------------------
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG

-----------------------------
|      Processed Input      |
-----------------------------
M_UNK A_UNK L_UNK W_UNK M_UNK R_UNK L_UNK L_UNK P_UNK L_UNK L_UNK A_UNK L_UNK L_UNK A_UNK L_UNK W_UNK G_UNK P_UNK D_UNK P_UNK A_UNK A_UNK A_UNK F_UNK V_UNK N_UNK Q_UNK H_UNK L_UNK C_UNK G_UNK S_UNK H_UNK L_UNK V_UNK E_UNK A_UNK L_UNK Y_UNK L_UNK V_UNK C_UNK G_UNK E_UNK R_UNK G_UNK F_UNK F_UNK Y_UNK T_UNK P_UNK K_UNK T_UNK R_UNK R_UNK E_UNK A_UNK E_UNK D_UNK L_UNK Q_UNK V_UNK G_UNK Q_UNK V_UNK E_UNK L_UNK G_UNK G_UNK __UNK

-----------------------------
|       Predicted DNA       |
-----------------------------
ATGGCTTTATGGATGCGTCTGCTGCCGCTGCTGGCGCTGCTGGCGCTGTGGGGCCCGGACCCGGCGGCGGCGTTTGTGAATCAGCACCTGTGCGGCAGCCACCTGGTGGAAGCGCTGTATCTGGTGTGCGGTGAGCGCGGCTTCTTCTACACGCCCAAAACCCGCCGCGAAGCGGAAGATCTGCAGGTGGGCCAGGTGGAGCTGGGCGGCTAA
```


## Additional Resources
- **Project Website** <br>
  https://adibvafa.github.io/CodonTransformer/

- **GitHub Repository** <br>
  https://github.com/Adibvafa/CodonTransformer

- **Google Colab Demo** <br>
  https://adibvafa.github.io/CodonTransformer/GoogleColab

- **PyPI Package** <br>
  https://pypi.org/project/CodonTransformer/

- **Paper** <br>
  https://www.biorxiv.org/content/10.1101/2024.09.13.612903