ybelkada
/

switch-base-8-xsum

@@ -104,6 +104,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
 outputs = model.generate(input_ids)
 print(tokenizer.decode(outputs[0]))
 ```
 </details>
@@ -127,6 +128,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
 outputs = model.generate(input_ids)
 print(tokenizer.decode(outputs[0]))
 ```
 </details>
@@ -148,6 +150,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
 outputs = model.generate(input_ids)
 print(tokenizer.decode(outputs[0]))
 ```
 </details>
@@ -180,7 +183,7 @@ More information needed.
 ## Sensitive Use:
-> Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
 # Training Details
@@ -193,7 +196,7 @@ The model was trained on a Masked Language Modeling task, on Colossal Clean Craw
 According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
-> These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.
 The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).

 outputs = model.generate(input_ids)
 print(tokenizer.decode(outputs[0]))
+>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
 ```
 </details>
 outputs = model.generate(input_ids)
 print(tokenizer.decode(outputs[0]))
+>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
 ```
 </details>
 outputs = model.generate(input_ids)
 print(tokenizer.decode(outputs[0]))
+>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
 ```
 </details>
 ## Sensitive Use:
+> SwitchTransformers should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
 # Training Details
 According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
+> These models are based on pretrained SwitchTransformers and are not fine-tuned. It is normal if they perform well on zero-shot tasks.
 The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).