ShuxianZou
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,33 +1,86 @@
|
|
1 |
# AIDO.RNA 1.6B
|
2 |
|
3 |
-
AIDO.RNA is
|
4 |
|
5 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/63008d4bc1e149ceaff724a3/mNqn5SKQFHxSby3E2dosE.png" alt="description" style="width:80%; height:auto;">
|
6 |
|
|
|
|
|
7 |
|
8 |
-
##
|
|
|
9 |
|
10 |
-
|
|
|
11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
-
### Sequence-level
|
14 |
```
|
15 |
import torch
|
16 |
from genbio_finetune.tasks import SequenceClassification
|
17 |
-
|
18 |
-
model = SequenceClassification.from_config({"model.backbone": "rnafm",
|
19 |
-
"model.n_classes": 2,
|
20 |
-
"model.adapter": MLPPoolAdapter,
|
21 |
-
})
|
22 |
collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
|
23 |
logits = model(collated_batch)
|
24 |
print(logits)
|
25 |
print(torch.argmax(logits, dim=-1))
|
26 |
```
|
27 |
|
28 |
-
###
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
## Citation
|
33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# AIDO.RNA 1.6B
|
2 |
|
3 |
+
AIDO.RNA is a 1.6B parameter RNA foundation model trained on 42 million non-coding RNA sequences at single-nucleotide resolution. It achieves state-of-the-art performance on a comprehensive set of tasks, including RNA secondary structure prediction, mRNA-related tasks, RNA function prediction tasks, and RNA inverse folding.
|
4 |
|
5 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/63008d4bc1e149ceaff724a3/mNqn5SKQFHxSby3E2dosE.png" alt="description" style="width:80%; height:auto;">
|
6 |
|
7 |
+
## Model architectural details
|
8 |
+
TODO
|
9 |
|
10 |
+
## Pre-training data
|
11 |
+
TODO
|
12 |
|
13 |
+
## Downstream evaluation
|
14 |
+
TODO
|
15 |
|
16 |
+
## How to Use
|
17 |
+
Build any downstream models from this backbone
|
18 |
+
|
19 |
+
### Get RNA sequence embedding
|
20 |
+
```
|
21 |
+
from genbio_finetune.tasks import Embed
|
22 |
+
model = Embed.from_config({"model.backbone": "rnafm"})
|
23 |
+
collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
|
24 |
+
embedding = model(collated_batch)
|
25 |
+
print(embedding.shape)
|
26 |
+
print(embedding)
|
27 |
+
```
|
28 |
|
29 |
+
### Sequence-level classification
|
30 |
```
|
31 |
import torch
|
32 |
from genbio_finetune.tasks import SequenceClassification
|
33 |
+
model = SequenceClassification.from_config({"model.backbone": "rnafm", "model.n_classes": 2})
|
|
|
|
|
|
|
|
|
34 |
collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
|
35 |
logits = model(collated_batch)
|
36 |
print(logits)
|
37 |
print(torch.argmax(logits, dim=-1))
|
38 |
```
|
39 |
|
40 |
+
### Token-level classification
|
41 |
+
```
|
42 |
+
import torch
|
43 |
+
from genbio_finetune.tasks import TokenClassification
|
44 |
+
model = TokenClassification.from_config({"model.backbone": "rnafm", "model.n_classes": 3})
|
45 |
+
collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
|
46 |
+
logits = model(collated_batch)
|
47 |
+
print(logits)
|
48 |
+
print(torch.argmax(logits, dim=-1))
|
49 |
+
```
|
50 |
|
51 |
|
52 |
+
### Pairwise token-level classification
|
53 |
+
@Sazan TODO
|
54 |
+
|
55 |
+
|
56 |
+
### Sequence-level regression
|
57 |
+
```
|
58 |
+
from genbio_finetune.tasks import SequenceRegression
|
59 |
+
model = SequenceRegression.from_config({"model.backbone": "rnafm"})
|
60 |
+
collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
|
61 |
+
logits = model(collated_batch)
|
62 |
+
print(logits)
|
63 |
+
```
|
64 |
+
|
65 |
+
Or use our one-liner CLI to finetune or evaluate any of the above!
|
66 |
+
```
|
67 |
+
gbft fit --model SequenceClassification --model.backbone rnafm --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
|
68 |
+
gbft test --model SequenceClassification --model.backbone rnafm --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
|
69 |
+
```
|
70 |
+
|
71 |
+
For more information, visit: [Model Generator](https://github.com/genbio-ai/test)
|
72 |
+
|
73 |
## Citation
|
74 |
+
Please cite AIDO.RNA using the following BibTeX code:
|
75 |
+
|
76 |
+
@inproceedings{ellington2024accurate,
|
77 |
+
title={Accurate and General {DNA} Representations Emerge from Genome Foundation Models at Scale},
|
78 |
+
author={Caleb Ellington, Ning Sun, Nicholas Ho, Tianhua Tao, Sazan Mahbub, Yonghao Zhuang, Hongyi Wang, Eric P. Xing, Le Song},
|
79 |
+
booktitle={NeurIPS 2024 Workshop on AI for New Drug Modalities},
|
80 |
+
year={2024}
|
81 |
+
}
|
82 |
+
|
83 |
+
## License
|
84 |
+
@Hongyi TODO
|
85 |
+
|
86 |
+
|