ShuxianZou commited on
Commit
75693bc
·
verified ·
1 Parent(s): 3423cbe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -19
README.md CHANGED
@@ -1,6 +1,6 @@
1
- # AIDO.RNA 1.6B
2
 
3
- AIDO.RNA is a general-purpose RNA foundation model with 1.6 billion parameters, trained on 42 million non-coding RNA sequences at single-nucleotide resolution. It achieves state-of-the-art performance on a comprehensive set of tasks, including RNA secondary structure prediction, mRNA-related tasks, RNA function prediction, and RNA inverse folding. After domain adaptation, AIDO.RNA excels in modeling protein-level tasks, highlighting its potential to leverage the central dogma for enhancing biomolecular representations.
4
 
5
  <p align="center">
6
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63008d4bc1e149ceaff724a3/mNqn5SKQFHxSby3E2dosE.png" alt="description" style="width:80%; height:auto;">
@@ -35,18 +35,27 @@ Build any downstream models from this backbone
35
  ### Get RNA sequence embedding
36
  ```python
37
  from genbio_finetune.tasks import Embed
38
- model = Embed.from_config({"model.backbone": "rnafm"}).eval()
39
  collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
40
  embedding = model(collated_batch)
41
  print(embedding.shape)
42
  print(embedding)
43
  ```
44
 
 
 
 
 
 
 
 
 
 
45
  ### Sequence-level classification
46
  ```python
47
  import torch
48
  from genbio_finetune.tasks import SequenceClassification
49
- model = SequenceClassification.from_config({"model.backbone": "rnafm", "model.n_classes": 2}).eval()
50
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
51
  logits = model(collated_batch)
52
  print(logits)
@@ -57,37 +66,28 @@ print(torch.argmax(logits, dim=-1))
57
  ```python
58
  import torch
59
  from genbio_finetune.tasks import TokenClassification
60
- model = TokenClassification.from_config({"model.backbone": "rnafm", "model.n_classes": 3}).eval()
61
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
62
  logits = model(collated_batch)
63
  print(logits)
64
  print(torch.argmax(logits, dim=-1))
65
  ```
66
 
67
-
68
  ### Pairwise token-level classification
69
  @Sazan TODO
70
 
71
 
72
- ### Sequence-level regression
73
- ```python
74
- from genbio_finetune.tasks import SequenceRegression
75
- model = SequenceRegression.from_config({"model.backbone": "rnafm"}).eval()
76
- collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
77
- logits = model(collated_batch)
78
- print(logits)
79
- ```
80
-
81
  ## RNA inverse folding
82
- @Sazan TODO
 
83
 
84
  Or use our one-liner CLI to finetune or evaluate any of the above!
85
  ```bash
86
- gbft fit --model SequenceClassification --model.backbone rnafm --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
87
- gbft test --model SequenceClassification --model.backbone rnafm --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
88
  ```
89
 
90
- For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
91
 
92
  ## Citation
93
  Please cite AIDO.RNA using the following BibTeX code:
 
1
+ # AIDO.RNA-1.6B
2
 
3
+ AIDO.RNA-1.6B is a general-purpose RNA foundation model with 1.6 billion parameters, trained on 42 million non-coding RNA sequences at single-nucleotide resolution. It achieves state-of-the-art performance on a comprehensive set of tasks, including RNA secondary structure prediction, mRNA-related tasks, RNA function prediction, and RNA inverse folding. After domain adaptation, AIDO.RNA excels in modeling protein-level tasks, highlighting its potential to leverage the central dogma for enhancing biomolecular representations.
4
 
5
  <p align="center">
6
  <img src="https://cdn-uploads.huggingface.co/production/uploads/63008d4bc1e149ceaff724a3/mNqn5SKQFHxSby3E2dosE.png" alt="description" style="width:80%; height:auto;">
 
35
  ### Get RNA sequence embedding
36
  ```python
37
  from genbio_finetune.tasks import Embed
38
+ model = Embed.from_config({"model.backbone": "aido_rna_1b600m"}).eval()
39
  collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
40
  embedding = model(collated_batch)
41
  print(embedding.shape)
42
  print(embedding)
43
  ```
44
 
45
+ ### Sequence-level regression
46
+ ```python
47
+ from genbio_finetune.tasks import SequenceRegression
48
+ model = SequenceRegression.from_config({"model.backbone": "aido_rna_1b600m"}).eval()
49
+ collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
50
+ logits = model(collated_batch)
51
+ print(logits)
52
+ ```
53
+
54
  ### Sequence-level classification
55
  ```python
56
  import torch
57
  from genbio_finetune.tasks import SequenceClassification
58
+ model = SequenceClassification.from_config({"model.backbone": "aido_rna_1b600m", "model.n_classes": 2}).eval()
59
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
60
  logits = model(collated_batch)
61
  print(logits)
 
66
  ```python
67
  import torch
68
  from genbio_finetune.tasks import TokenClassification
69
+ model = TokenClassification.from_config({"model.backbone": "aido_rna_1b600m", "model.n_classes": 3}).eval()
70
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
71
  logits = model(collated_batch)
72
  print(logits)
73
  print(torch.argmax(logits, dim=-1))
74
  ```
75
 
 
76
  ### Pairwise token-level classification
77
  @Sazan TODO
78
 
79
 
 
 
 
 
 
 
 
 
 
80
  ## RNA inverse folding
81
+ @Sazan
82
+
83
 
84
  Or use our one-liner CLI to finetune or evaluate any of the above!
85
  ```bash
86
+ mgen fit --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
87
+ mgen test --model SequenceClassification --model.backbone aido_rna_1b600m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
88
  ```
89
 
90
+ For more information, visit: [ModelGenerator](https://github.com/genbio-ai/modelgenerator)
91
 
92
  ## Citation
93
  Please cite AIDO.RNA using the following BibTeX code: