google
/

byt5-xxl

@@ -14,7 +14,7 @@ ByT5 was only pre-trained on [mC4](https://www.tensorflow.org/datasets/catalog/c
 ByT5 works especially well on noisy text data,*e.g.*, `google/byt5-xxl` significantly outperforms [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [TweetQA](https://arxiv.org/abs/1907.06292).
-Paper: [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/pdf/1910.10683.pdf)
 Authors: *Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel*

 ByT5 works especially well on noisy text data,*e.g.*, `google/byt5-xxl` significantly outperforms [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [TweetQA](https://arxiv.org/abs/1907.06292).
+Paper: [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626)
 Authors: *Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel*