probablybots commited on
Commit
c3b4f26
·
verified ·
1 Parent(s): c2ac139

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -47
README.md CHANGED
@@ -30,80 +30,70 @@ The pre-training data contains 42 million unique ncRNA sequences from RNAcentral
30
 
31
 
32
  ## How to Use
33
- Build any downstream models from this backbone
 
 
 
 
 
34
 
35
- ### Get RNA sequence embedding
 
36
  ```python
37
- from genbio_finetune.tasks import Embed
38
- model = Embed.from_config({"model.backbone": "aido_rna_1b600m"}).eval()
39
- collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
40
  embedding = model(collated_batch)
41
  print(embedding.shape)
42
  print(embedding)
43
  ```
44
-
45
- ### Sequence-level regression
46
  ```python
47
- from genbio_finetune.tasks import SequenceRegression
48
- model = SequenceRegression.from_config({"model.backbone": "aido_rna_1b600m"}).eval()
 
49
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
50
  logits = model(collated_batch)
51
  print(logits)
 
52
  ```
53
-
54
- ### Sequence-level classification
55
  ```python
56
  import torch
57
- from genbio_finetune.tasks import SequenceClassification
58
- model = SequenceClassification.from_config({"model.backbone": "aido_rna_1b600m", "model.n_classes": 2}).eval()
59
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
60
  logits = model(collated_batch)
61
  print(logits)
62
  print(torch.argmax(logits, dim=-1))
63
  ```
64
-
65
- ### Token-level classification
66
  ```python
67
- import torch
68
- from genbio_finetune.tasks import TokenClassification
69
- model = TokenClassification.from_config({"model.backbone": "aido_rna_1b600m", "model.n_classes": 3}).eval()
70
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
71
  logits = model(collated_batch)
72
  print(logits)
73
- print(torch.argmax(logits, dim=-1))
74
- ```
75
-
76
- ### Pairwise token-level classification
77
- @Sazan TODO
78
-
79
 
80
- ## RNA inverse folding
81
- @Sazan
82
-
83
-
84
- Or use our one-liner CLI to finetune or evaluate any of the above!
85
- ```bash
86
- mgen fit --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
87
- mgen test --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
88
  ```
89
 
90
- For more information, visit: [ModelGenerator](https://github.com/genbio-ai/modelgenerator)
91
-
92
  ## Citation
93
  Please cite AIDO.RNA using the following BibTeX code:
94
  ```
95
- @inproceedings{
96
- zou2024a,
97
- title={A Large-Scale Foundation Model for {RNA} Function and Structure Prediction},
98
- author={Shuxian Zou and Tianhua Tao and Sazan Mahbub and Caleb Ellington and Robin Jonathan Algayres and Dian Li and Yonghao Zhuang and Hongyi Wang and Le Song and Eric P. Xing},
99
- booktitle={NeurIPS 2024 Workshop on AI for New Drug Modalities},
100
- year={2024},
101
- url={https://openreview.net/forum?id=Gzo3JMPY8w}
102
  }
103
  ```
104
-
105
-
106
- ## License
107
- @Hongyi TODO
108
-
109
-
 
30
 
31
 
32
  ## How to Use
33
+ ### Build any downstream models from this backbone with ModelGenerator
34
+ For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
35
+ ```bash
36
+ mgen fit --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
37
+ mgen test --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
38
+ ```
39
 
40
+ ### Or use directly in Python
41
+ #### Embedding
42
  ```python
43
+ from modelgenerator.tasks import Embed
44
+ model = Embed.from_config({"model.backbone": "aido_dna_7b"}).eval()
45
+ collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
46
  embedding = model(collated_batch)
47
  print(embedding.shape)
48
  print(embedding)
49
  ```
50
+ #### Sequence-level Classification
 
51
  ```python
52
+ import torch
53
+ from modelgenerator.tasks import SequenceClassification
54
+ model = SequenceClassification.from_config({"model.backbone": "aido_dna_7b", "model.n_classes": 2}).eval()
55
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
56
  logits = model(collated_batch)
57
  print(logits)
58
+ print(torch.argmax(logits, dim=-1))
59
  ```
60
+ #### Token-level Classification
 
61
  ```python
62
  import torch
63
+ from modelgenerator.tasks import TokenClassification
64
+ model = TokenClassification.from_config({"model.backbone": "aido_dna_7b", "model.n_classes": 3}).eval()
65
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
66
  logits = model(collated_batch)
67
  print(logits)
68
  print(torch.argmax(logits, dim=-1))
69
  ```
70
+ #### Sequence-level Regression
 
71
  ```python
72
+ from modelgenerator.tasks import SequenceRegression
73
+ model = SequenceRegression.from_config({"model.backbone": "aido_dna_7b"}).eval()
 
74
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
75
  logits = model(collated_batch)
76
  print(logits)
 
 
 
 
 
 
77
 
78
+ ### Get RNA sequence embedding
79
+ ```python
80
+ from genbio_finetune.tasks import Embed
81
+ model = Embed.from_config({"model.backbone": "aido_rna_1b600m"}).eval()
82
+ collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
83
+ embedding = model(collated_batch)
84
+ print(embedding.shape)
85
+ print(embedding)
86
  ```
87
 
 
 
88
  ## Citation
89
  Please cite AIDO.RNA using the following BibTeX code:
90
  ```
91
+ @misc{zou_large-scale_2024,
92
+ title = {A Large-Scale Foundation Model for RNA Function and Structure Prediction},
93
+ url = {https://www.biorxiv.org/content/10.1101/2024.11.28.625345v1},
94
+ doi = {10.1101/2024.11.28.625345},
95
+ publisher = {bioRxiv},
96
+ author = {Zou, Shuxian and Tao, Tianhua and Mahbub, Sazan and Ellington, Caleb N. and Algayres, Robin and Li, Dian and Zhuang, Yonghao and Wang, Hongyi and Song, Le and Xing, Eric P.},
97
+ year = {2024},
98
  }
99
  ```