patrickvonplaten
commited on
Commit
•
9d8f8ca
1
Parent(s):
0198c1c
Update README.md
Browse files
README.md
CHANGED
@@ -97,58 +97,70 @@ This model was contributed by [Daniel Hesslow](https://huggingface.co/Seledorn).
|
|
97 |
## Examples
|
98 |
|
99 |
The following shows how one can predict masked passages using the different denoising strategies.
|
|
|
100 |
|
101 |
-
|
|
|
|
|
102 |
|
103 |
```python
|
104 |
from transformers import T5ForConditionalGeneration, AutoTokenizer
|
105 |
-
|
106 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
107 |
```
|
108 |
-
# Example usage
|
109 |
|
|
|
|
|
|
|
110 |
|
111 |
```python
|
112 |
-
|
113 |
-
|
114 |
|
115 |
-
|
|
|
116 |
|
117 |
-
|
118 |
-
]
|
119 |
|
120 |
-
|
121 |
|
|
|
122 |
|
123 |
-
|
124 |
-
|
125 |
-
|
126 |
-
with torch.no_grad():
|
127 |
-
for inp in inps:
|
128 |
-
inputs = tokenizer(inp, return_tensors="pt").input_ids
|
129 |
-
|
130 |
-
inputs_ = inputs.cuda()
|
131 |
-
outputs = model.generate(inputs_, max_length = 200, do_sample=True, temperature = 0.9, num_return_sequences=4)
|
132 |
-
for output in outputs:
|
133 |
-
out = tokenizer.decode(output)
|
134 |
-
|
135 |
-
inps = re.split(pattern, inp)
|
136 |
-
outs = re.split(pattern, out)
|
137 |
-
|
138 |
-
l = [z for (x,y) in zip(inps, outs[1:len(inps)]+ [""]) for z in (x,"*"+y+"*" if y != "" else "")]
|
139 |
-
print("".join(l))
|
140 |
-
print("-------------------------------")
|
141 |
```
|
142 |
|
|
|
143 |
|
144 |
-
|
145 |
|
146 |
-
|
|
|
|
|
147 |
|
148 |
-
|
|
|
149 |
|
150 |
-
|
|
|
151 |
|
152 |
-
|
|
|
153 |
|
|
|
154 |
|
|
|
|
|
|
|
|
97 |
## Examples
|
98 |
|
99 |
The following shows how one can predict masked passages using the different denoising strategies.
|
100 |
+
Given the size of the model the following examples need to be run on at least a 40GB A100 GPU.
|
101 |
|
102 |
+
### R-Denoising
|
103 |
+
|
104 |
+
For *R-Denoising*, please make sure to prompt the text with the prefix `[S2S]` as shown below.
|
105 |
|
106 |
```python
|
107 |
from transformers import T5ForConditionalGeneration, AutoTokenizer
|
108 |
+
import torch
|
109 |
+
|
110 |
+
model = T5ForConditionalGeneration.from_pretrained("google/ul2", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")
|
111 |
+
tokenizer = AutoTokenizer.from_pretrained("google/ul2")
|
112 |
+
|
113 |
+
input_string = "[S2S] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man with a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere <extra_id_0>"
|
114 |
+
|
115 |
+
inputs = tokenizer(input_string, return_tensors="pt").input_ids.to("cuda")
|
116 |
+
|
117 |
+
outputs = model.generate(inputs, max_length=200)
|
118 |
+
|
119 |
+
print(tokenizer.decode(outputs[0]))
|
120 |
+
# -> <pad>. Dudley was a very good boy, but he was also very stupid.</s>
|
121 |
```
|
|
|
122 |
|
123 |
+
### S-Denoising
|
124 |
+
|
125 |
+
For *S-Denoising*, please make sure to prompt the text with the prefix `[NLU]` as shown below.
|
126 |
|
127 |
```python
|
128 |
+
from transformers import T5ForConditionalGeneration, AutoTokenizer
|
129 |
+
import torch
|
130 |
|
131 |
+
model = T5ForConditionalGeneration.from_pretrained("google/ul2", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")
|
132 |
+
tokenizer = AutoTokenizer.from_pretrained("google/ul2")
|
133 |
|
134 |
+
input_string = "[NLU] Mr. Dursley was the director of a firm called <extra_id_0>, which made <extra_id_1>. He was a big, solid man with a bald head. Mrs. Dursley was thin and <extra_id_2> of neck, which came in very useful as she spent so much of her time <extra_id_3>. The Dursleys had a small son called Dudley and <extra_id_4>"
|
|
|
135 |
|
136 |
+
inputs = tokenizer(input_string, return_tensors="pt", add_special_tokens=False).input_ids.to("cuda")
|
137 |
|
138 |
+
outputs = model.generate(inputs, max_length=200)
|
139 |
|
140 |
+
print(tokenizer.decode(outputs[0]))
|
141 |
+
# -> "<pad><extra_id_0> Burrows<extra_id_1> brooms for witches and wizards<extra_id_2> had a lot<extra_id_3> scolding Dudley<extra_id_4> a daughter called Petunia. Dudley was a nasty, spoiled little boy who was always getting into trouble. He was very fond of his pet rat, Scabbers.<extra_id_5> Burrows<extra_id_3> screaming at him<extra_id_4> a daughter called Petunia</s>
|
142 |
+
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
143 |
```
|
144 |
|
145 |
+
### X-Denoising
|
146 |
|
147 |
+
For *X-Denoising*, please make sure to prompt the text with the prefix `[NLG]` as shown below.
|
148 |
|
149 |
+
```python
|
150 |
+
from transformers import T5ForConditionalGeneration, AutoTokenizer
|
151 |
+
import torch
|
152 |
|
153 |
+
model = T5ForConditionalGeneration.from_pretrained("google/ul2", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")
|
154 |
+
tokenizer = AutoTokenizer.from_pretrained("google/ul2")
|
155 |
|
156 |
+
input_string = "[NLG] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man wiht a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she
|
157 |
+
spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere. <extra_id_0>"
|
158 |
|
159 |
+
model.cuda()
|
160 |
+
inputs = tokenizer(input_string, return_tensors="pt", add_special_tokens=False).input_ids.to("cuda")
|
161 |
|
162 |
+
outputs = model.generate(inputs, max_length=200)
|
163 |
|
164 |
+
print(tokenizer.decode(outputs[0]))
|
165 |
+
# -> "<pad><extra_id_0> Burrows<extra_id_1> a lot of money from the manufacture of a product called '' Burrows'''s ''<extra_id_2> had a lot<extra_id_3> looking down people's throats<extra_id_4> a daughter called Petunia. Dudley was a very stupid boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat,"
|
166 |
+
```
|