Update README.md (#1)
Browse files- Update README.md (3fc40341191ad6f070124775d856b5515e9a0f06)
Co-authored-by: Arthur Zucker <[email protected]>
README.md
CHANGED
@@ -104,6 +104,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
|
|
104 |
|
105 |
outputs = model.generate(input_ids)
|
106 |
print(tokenizer.decode(outputs[0]))
|
|
|
107 |
```
|
108 |
|
109 |
</details>
|
@@ -127,6 +128,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
|
|
127 |
|
128 |
outputs = model.generate(input_ids)
|
129 |
print(tokenizer.decode(outputs[0]))
|
|
|
130 |
```
|
131 |
|
132 |
</details>
|
@@ -148,6 +150,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
|
|
148 |
|
149 |
outputs = model.generate(input_ids)
|
150 |
print(tokenizer.decode(outputs[0]))
|
|
|
151 |
```
|
152 |
|
153 |
</details>
|
@@ -180,7 +183,7 @@ More information needed.
|
|
180 |
|
181 |
## Sensitive Use:
|
182 |
|
183 |
-
>
|
184 |
|
185 |
# Training Details
|
186 |
|
@@ -193,7 +196,7 @@ The model was trained on a Masked Language Modeling task, on Colossal Clean Craw
|
|
193 |
|
194 |
According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
|
195 |
|
196 |
-
> These models are based on pretrained
|
197 |
|
198 |
The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
|
199 |
|
|
|
104 |
|
105 |
outputs = model.generate(input_ids)
|
106 |
print(tokenizer.decode(outputs[0]))
|
107 |
+
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
|
108 |
```
|
109 |
|
110 |
</details>
|
|
|
128 |
|
129 |
outputs = model.generate(input_ids)
|
130 |
print(tokenizer.decode(outputs[0]))
|
131 |
+
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
|
132 |
```
|
133 |
|
134 |
</details>
|
|
|
150 |
|
151 |
outputs = model.generate(input_ids)
|
152 |
print(tokenizer.decode(outputs[0]))
|
153 |
+
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
|
154 |
```
|
155 |
|
156 |
</details>
|
|
|
183 |
|
184 |
## Sensitive Use:
|
185 |
|
186 |
+
> SwitchTransformers should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
|
187 |
|
188 |
# Training Details
|
189 |
|
|
|
196 |
|
197 |
According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
|
198 |
|
199 |
+
> These models are based on pretrained SwitchTransformers and are not fine-tuned. It is normal if they perform well on zero-shot tasks.
|
200 |
|
201 |
The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
|
202 |
|