ShuxianZou
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
-
# AIDO.RNA
|
2 |
|
3 |
-
AIDO.RNA is a general-purpose RNA foundation model with 1.6 billion parameters, trained on 42 million non-coding RNA sequences at single-nucleotide resolution. It achieves state-of-the-art performance on a comprehensive set of tasks, including RNA secondary structure prediction, mRNA-related tasks, RNA function prediction, and RNA inverse folding. After domain adaptation, AIDO.RNA excels in modeling protein-level tasks, highlighting its potential to leverage the central dogma for enhancing biomolecular representations.
|
4 |
|
5 |
<p align="center">
|
6 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/63008d4bc1e149ceaff724a3/mNqn5SKQFHxSby3E2dosE.png" alt="description" style="width:80%; height:auto;">
|
@@ -35,18 +35,27 @@ Build any downstream models from this backbone
|
|
35 |
### Get RNA sequence embedding
|
36 |
```python
|
37 |
from genbio_finetune.tasks import Embed
|
38 |
-
model = Embed.from_config({"model.backbone": "
|
39 |
collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
|
40 |
embedding = model(collated_batch)
|
41 |
print(embedding.shape)
|
42 |
print(embedding)
|
43 |
```
|
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
### Sequence-level classification
|
46 |
```python
|
47 |
import torch
|
48 |
from genbio_finetune.tasks import SequenceClassification
|
49 |
-
model = SequenceClassification.from_config({"model.backbone": "
|
50 |
collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
|
51 |
logits = model(collated_batch)
|
52 |
print(logits)
|
@@ -57,37 +66,28 @@ print(torch.argmax(logits, dim=-1))
|
|
57 |
```python
|
58 |
import torch
|
59 |
from genbio_finetune.tasks import TokenClassification
|
60 |
-
model = TokenClassification.from_config({"model.backbone": "
|
61 |
collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
|
62 |
logits = model(collated_batch)
|
63 |
print(logits)
|
64 |
print(torch.argmax(logits, dim=-1))
|
65 |
```
|
66 |
|
67 |
-
|
68 |
### Pairwise token-level classification
|
69 |
@Sazan TODO
|
70 |
|
71 |
|
72 |
-
### Sequence-level regression
|
73 |
-
```python
|
74 |
-
from genbio_finetune.tasks import SequenceRegression
|
75 |
-
model = SequenceRegression.from_config({"model.backbone": "rnafm"}).eval()
|
76 |
-
collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
|
77 |
-
logits = model(collated_batch)
|
78 |
-
print(logits)
|
79 |
-
```
|
80 |
-
|
81 |
## RNA inverse folding
|
82 |
-
@Sazan
|
|
|
83 |
|
84 |
Or use our one-liner CLI to finetune or evaluate any of the above!
|
85 |
```bash
|
86 |
-
|
87 |
-
|
88 |
```
|
89 |
|
90 |
-
For more information, visit: [
|
91 |
|
92 |
## Citation
|
93 |
Please cite AIDO.RNA using the following BibTeX code:
|
|
|
1 |
+
# AIDO.RNA-1.6B
|
2 |
|
3 |
+
AIDO.RNA-1.6B is a general-purpose RNA foundation model with 1.6 billion parameters, trained on 42 million non-coding RNA sequences at single-nucleotide resolution. It achieves state-of-the-art performance on a comprehensive set of tasks, including RNA secondary structure prediction, mRNA-related tasks, RNA function prediction, and RNA inverse folding. After domain adaptation, AIDO.RNA excels in modeling protein-level tasks, highlighting its potential to leverage the central dogma for enhancing biomolecular representations.
|
4 |
|
5 |
<p align="center">
|
6 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/63008d4bc1e149ceaff724a3/mNqn5SKQFHxSby3E2dosE.png" alt="description" style="width:80%; height:auto;">
|
|
|
35 |
### Get RNA sequence embedding
|
36 |
```python
|
37 |
from genbio_finetune.tasks import Embed
|
38 |
+
model = Embed.from_config({"model.backbone": "aido_rna_1b600m"}).eval()
|
39 |
collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
|
40 |
embedding = model(collated_batch)
|
41 |
print(embedding.shape)
|
42 |
print(embedding)
|
43 |
```
|
44 |
|
45 |
+
### Sequence-level regression
|
46 |
+
```python
|
47 |
+
from genbio_finetune.tasks import SequenceRegression
|
48 |
+
model = SequenceRegression.from_config({"model.backbone": "aido_rna_1b600m"}).eval()
|
49 |
+
collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
|
50 |
+
logits = model(collated_batch)
|
51 |
+
print(logits)
|
52 |
+
```
|
53 |
+
|
54 |
### Sequence-level classification
|
55 |
```python
|
56 |
import torch
|
57 |
from genbio_finetune.tasks import SequenceClassification
|
58 |
+
model = SequenceClassification.from_config({"model.backbone": "aido_rna_1b600m", "model.n_classes": 2}).eval()
|
59 |
collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
|
60 |
logits = model(collated_batch)
|
61 |
print(logits)
|
|
|
66 |
```python
|
67 |
import torch
|
68 |
from genbio_finetune.tasks import TokenClassification
|
69 |
+
model = TokenClassification.from_config({"model.backbone": "aido_rna_1b600m", "model.n_classes": 3}).eval()
|
70 |
collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
|
71 |
logits = model(collated_batch)
|
72 |
print(logits)
|
73 |
print(torch.argmax(logits, dim=-1))
|
74 |
```
|
75 |
|
|
|
76 |
### Pairwise token-level classification
|
77 |
@Sazan TODO
|
78 |
|
79 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
80 |
## RNA inverse folding
|
81 |
+
@Sazan
|
82 |
+
|
83 |
|
84 |
Or use our one-liner CLI to finetune or evaluate any of the above!
|
85 |
```bash
|
86 |
+
mgen fit --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
|
87 |
+
mgen test --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
|
88 |
```
|
89 |
|
90 |
+
For more information, visit: [ModelGenerator](https://github.com/genbio-ai/modelgenerator)
|
91 |
|
92 |
## Citation
|
93 |
Please cite AIDO.RNA using the following BibTeX code:
|