ZhiyuanChen commited on
Commit
6a3616f
1 Parent(s): faebe15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -32
README.md CHANGED
@@ -10,6 +10,19 @@ library_name: multimolecule
10
  pipeline_tag: fill-mask
11
  mask_token: "<mask>"
12
  widget:
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - example_title: "microRNA-21"
14
  text: "UAGC<mask>UAUCAGACUGAUGUUGA"
15
  output:
@@ -47,7 +60,7 @@ ERNIE-RNA is a [bert](https://huggingface.co/google-bert/bert-base-uncased)-styl
47
  ### Variations
48
 
49
  - **[`multimolecule/ernierna`](https://huggingface.co/multimolecule/ernierna)**: The ERNIE-RNA model pre-trained on non-coding RNA sequences.
50
- - **[`multimolecule/ernierna.ss`](https://huggingface.co/multimolecule/ernierna.ss)**: The ERNIE-RNA model fine-tuned on RNA secondary structure prediction.
51
 
52
  ### Model Specification
53
 
@@ -62,7 +75,7 @@ ERNIE-RNA is a [bert](https://huggingface.co/google-bert/bert-base-uncased)-styl
62
  - **Paper**: [ERNIE-RNA: An RNA Language Model with Structure-enhanced Representations](https://doi.org/10.1101/2024.03.17.585376)
63
  - **Developed by**: Weijie Yin, Zhaoyu Zhang, Liang He, Rui Jiang, Shuo Zhang, Gan Liu, Xuegong Zhang, Tao Qin, Zhen Xie
64
  - **Model type**: [BERT](https://huggingface.co/google-bert/bert-base-uncased) - [ERNIE](https://huggingface.co/nghuyong/ernie-3.0-base-zh)
65
- - **Original Repository**: [https://github.com/Bruce-ywj/ERNIE-RNA](https://github.com/Bruce-ywj/ERNIE-RNA)
66
 
67
  ## Usage
68
 
@@ -79,29 +92,29 @@ You can use this model directly with a pipeline for masked language modeling:
79
  ```python
80
  >>> import multimolecule # you must import multimolecule to register models
81
  >>> from transformers import pipeline
82
- >>> unmasker = pipeline('fill-mask', model='multimolecule/ernierna')
83
- >>> unmasker("uagc<mask>uaucagacugauguuga")
84
 
85
- [{'score': 0.22777850925922394,
86
- 'token': 9,
87
- 'token_str': 'U',
88
- 'sequence': 'U A G C U U A U C A G A C U G A U G U U G A'},
89
- {'score': 0.21105751395225525,
90
  'token': 6,
91
  'token_str': 'A',
92
- 'sequence': 'U A G C A U A U C A G A C U G A U G U U G A'},
93
- {'score': 0.18962091207504272,
 
 
 
 
94
  'token': 7,
95
  'token_str': 'C',
96
- 'sequence': 'U A G C C U A U C A G A C U G A U G U U G A'},
97
- {'score': 0.11191495507955551,
98
- 'token': 8,
99
- 'token_str': 'G',
100
- 'sequence': 'U A G C G U A U C A G A C U G A U G U U G A'},
101
- {'score': 0.09583593904972076,
102
  'token': 21,
103
  'token_str': '.',
104
- 'sequence': 'U A G C. U A U C A G A C U G A U G U U G A'}]
105
  ```
106
 
107
  ### Downstream Use
@@ -114,11 +127,11 @@ Here is how to use this model to get the features of a given sequence in PyTorch
114
  from multimolecule import RnaTokenizer, ErnieRnaModel
115
 
116
 
117
- tokenizer = RnaTokenizer.from_pretrained('multimolecule/ernierna')
118
- model = ErnieRnaModel.from_pretrained('multimolecule/ernierna')
119
 
120
  text = "UAGCUUAUCAGACUGAUGUUGA"
121
- input = tokenizer(text, return_tensors='pt')
122
 
123
  output = model(**input)
124
  ```
@@ -134,17 +147,17 @@ import torch
134
  from multimolecule import RnaTokenizer, ErnieRnaForSequencePrediction
135
 
136
 
137
- tokenizer = RnaTokenizer.from_pretrained('multimolecule/ernierna')
138
- model = ErnieRnaForSequencePrediction.from_pretrained('multimolecule/ernierna')
139
 
140
  text = "UAGCUUAUCAGACUGAUGUUGA"
141
- input = tokenizer(text, return_tensors='pt')
142
  label = torch.tensor([1])
143
 
144
  output = model(**input, labels=label)
145
  ```
146
 
147
- #### Nucleotide Classification / Regression
148
 
149
  **Note**: This model is not fine-tuned for any specific task. You will need to fine-tune the model on a downstream task to use it for nucleotide classification or regression.
150
 
@@ -152,14 +165,14 @@ Here is how to use this model as backbone to fine-tune for a nucleotide-level ta
152
 
153
  ```python
154
  import torch
155
- from multimolecule import RnaTokenizer, ErnieRnaForNucleotidePrediction
156
 
157
 
158
- tokenizer = RnaTokenizer.from_pretrained('multimolecule/ernierna')
159
- model = ErnieRnaForNucleotidePrediction.from_pretrained('multimolecule/ernierna')
160
 
161
  text = "UAGCUUAUCAGACUGAUGUUGA"
162
- input = tokenizer(text, return_tensors='pt')
163
  label = torch.randint(2, (len(text), ))
164
 
165
  output = model(**input, labels=label)
@@ -176,11 +189,11 @@ import torch
176
  from multimolecule import RnaTokenizer, ErnieRnaForContactPrediction
177
 
178
 
179
- tokenizer = RnaTokenizer.from_pretrained('multimolecule/ernierna')
180
- model = ErnieRnaForContactPrediction.from_pretrained('multimolecule/ernierna')
181
 
182
  text = "UAGCUUAUCAGACUGAUGUUGA"
183
- input = tokenizer(text, return_tensors='pt')
184
  label = torch.randint(2, (len(text), len(text)))
185
 
186
  output = model(**input, labels=label)
 
10
  pipeline_tag: fill-mask
11
  mask_token: "<mask>"
12
  widget:
13
+ - example_title: "HIV-1"
14
+ text: "GGUC<mask>CUCUGGUUAGACCAGAUCUGAGCCU"
15
+ output:
16
+ - label: "A"
17
+ score: 0.32839149236679077
18
+ - label: "U"
19
+ score: 0.3044775426387787
20
+ - label: "C"
21
+ score: 0.09914574027061462
22
+ - label: "-"
23
+ score: 0.09502048045396805
24
+ - label: "."
25
+ score: 0.06993662565946579
26
  - example_title: "microRNA-21"
27
  text: "UAGC<mask>UAUCAGACUGAUGUUGA"
28
  output:
 
60
  ### Variations
61
 
62
  - **[`multimolecule/ernierna`](https://huggingface.co/multimolecule/ernierna)**: The ERNIE-RNA model pre-trained on non-coding RNA sequences.
63
+ - **[`multimolecule/ernierna-ss`](https://huggingface.co/multimolecule/ernierna-ss)**: The ERNIE-RNA model fine-tuned on RNA secondary structure prediction.
64
 
65
  ### Model Specification
66
 
 
75
  - **Paper**: [ERNIE-RNA: An RNA Language Model with Structure-enhanced Representations](https://doi.org/10.1101/2024.03.17.585376)
76
  - **Developed by**: Weijie Yin, Zhaoyu Zhang, Liang He, Rui Jiang, Shuo Zhang, Gan Liu, Xuegong Zhang, Tao Qin, Zhen Xie
77
  - **Model type**: [BERT](https://huggingface.co/google-bert/bert-base-uncased) - [ERNIE](https://huggingface.co/nghuyong/ernie-3.0-base-zh)
78
+ - **Original Repository**: [Bruce-ywj/ERNIE-RNA](https://github.com/Bruce-ywj/ERNIE-RNA)
79
 
80
  ## Usage
81
 
 
92
  ```python
93
  >>> import multimolecule # you must import multimolecule to register models
94
  >>> from transformers import pipeline
95
+ >>> unmasker = pipeline("fill-mask", model="multimolecule/ernierna")
96
+ >>> unmasker("gguc<mask>cucugguuagaccagaucugagccu")
97
 
98
+ [{'score': 0.32839149236679077,
 
 
 
 
99
  'token': 6,
100
  'token_str': 'A',
101
+ 'sequence': 'G G U C A C U C U G G U U A G A C C A G A U C U G A G C C U'},
102
+ {'score': 0.3044775426387787,
103
+ 'token': 9,
104
+ 'token_str': 'U',
105
+ 'sequence': 'G G U C U C U C U G G U U A G A C C A G A U C U G A G C C U'},
106
+ {'score': 0.09914574027061462,
107
  'token': 7,
108
  'token_str': 'C',
109
+ 'sequence': 'G G U C C C U C U G G U U A G A C C A G A U C U G A G C C U'},
110
+ {'score': 0.09502048045396805,
111
+ 'token': 24,
112
+ 'token_str': '-',
113
+ 'sequence': 'G G U C - C U C U G G U U A G A C C A G A U C U G A G C C U'},
114
+ {'score': 0.06993662565946579,
115
  'token': 21,
116
  'token_str': '.',
117
+ 'sequence': 'G G U C. C U C U G G U U A G A C C A G A U C U G A G C C U'}]
118
  ```
119
 
120
  ### Downstream Use
 
127
  from multimolecule import RnaTokenizer, ErnieRnaModel
128
 
129
 
130
+ tokenizer = RnaTokenizer.from_pretrained("multimolecule/ernierna")
131
+ model = ErnieRnaModel.from_pretrained("multimolecule/ernierna")
132
 
133
  text = "UAGCUUAUCAGACUGAUGUUGA"
134
+ input = tokenizer(text, return_tensors="pt")
135
 
136
  output = model(**input)
137
  ```
 
147
  from multimolecule import RnaTokenizer, ErnieRnaForSequencePrediction
148
 
149
 
150
+ tokenizer = RnaTokenizer.from_pretrained("multimolecule/ernierna")
151
+ model = ErnieRnaForSequencePrediction.from_pretrained("multimolecule/ernierna")
152
 
153
  text = "UAGCUUAUCAGACUGAUGUUGA"
154
+ input = tokenizer(text, return_tensors="pt")
155
  label = torch.tensor([1])
156
 
157
  output = model(**input, labels=label)
158
  ```
159
 
160
+ #### Token Classification / Regression
161
 
162
  **Note**: This model is not fine-tuned for any specific task. You will need to fine-tune the model on a downstream task to use it for nucleotide classification or regression.
163
 
 
165
 
166
  ```python
167
  import torch
168
+ from multimolecule import RnaTokenizer, ErnieRnaForTokenPrediction
169
 
170
 
171
+ tokenizer = RnaTokenizer.from_pretrained("multimolecule/ernierna")
172
+ model = ErnieRnaForTokenPrediction.from_pretrained("multimolecule/ernierna")
173
 
174
  text = "UAGCUUAUCAGACUGAUGUUGA"
175
+ input = tokenizer(text, return_tensors="pt")
176
  label = torch.randint(2, (len(text), ))
177
 
178
  output = model(**input, labels=label)
 
189
  from multimolecule import RnaTokenizer, ErnieRnaForContactPrediction
190
 
191
 
192
+ tokenizer = RnaTokenizer.from_pretrained("multimolecule/ernierna")
193
+ model = ErnieRnaForContactPrediction.from_pretrained("multimolecule/ernierna")
194
 
195
  text = "UAGCUUAUCAGACUGAUGUUGA"
196
+ input = tokenizer(text, return_tensors="pt")
197
  label = torch.randint(2, (len(text), len(text)))
198
 
199
  output = model(**input, labels=label)