ybelkada ArthurZ HF staff commited on
Commit
f44f540
·
1 Parent(s): 29e433b

Update README.md (#1)

Browse files

- Update README.md (3fc40341191ad6f070124775d856b5515e9a0f06)


Co-authored-by: Arthur Zucker <[email protected]>

Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -104,6 +104,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
104
 
105
  outputs = model.generate(input_ids)
106
  print(tokenizer.decode(outputs[0]))
 
107
  ```
108
 
109
  </details>
@@ -127,6 +128,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
127
 
128
  outputs = model.generate(input_ids)
129
  print(tokenizer.decode(outputs[0]))
 
130
  ```
131
 
132
  </details>
@@ -148,6 +150,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
148
 
149
  outputs = model.generate(input_ids)
150
  print(tokenizer.decode(outputs[0]))
 
151
  ```
152
 
153
  </details>
@@ -180,7 +183,7 @@ More information needed.
180
 
181
  ## Sensitive Use:
182
 
183
- > Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
184
 
185
  # Training Details
186
 
@@ -193,7 +196,7 @@ The model was trained on a Masked Language Modeling task, on Colossal Clean Craw
193
 
194
  According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
195
 
196
- > These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.
197
 
198
  The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
199
 
 
104
 
105
  outputs = model.generate(input_ids)
106
  print(tokenizer.decode(outputs[0]))
107
+ >>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
108
  ```
109
 
110
  </details>
 
128
 
129
  outputs = model.generate(input_ids)
130
  print(tokenizer.decode(outputs[0]))
131
+ >>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
132
  ```
133
 
134
  </details>
 
150
 
151
  outputs = model.generate(input_ids)
152
  print(tokenizer.decode(outputs[0]))
153
+ >>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
154
  ```
155
 
156
  </details>
 
183
 
184
  ## Sensitive Use:
185
 
186
+ > SwitchTransformers should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
187
 
188
  # Training Details
189
 
 
196
 
197
  According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
198
 
199
+ > These models are based on pretrained SwitchTransformers and are not fine-tuned. It is normal if they perform well on zero-shot tasks.
200
 
201
  The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
202