byt5-large-wmt14-deen

This model is released as part of the work from Are Character-level Translations Worth the Wait? Comparing Character- and Subword-level Models for Machine Translation. It is a ByT5 model finetuned on German-->English translation the WMT14 dataset.

To use the model correctly, you must prepend the prompt with "translate X to Y: ", where X and Y are your source and target languages (e.g. German, English).

NOTE: The decoder_start_token_id is 259 for byt5 models and 250099 for mt5 models, which is different from the default token from google's byt5 and mt5 models (which is 0).

Downloads last month: 5

Inference Examples

Translation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train leukas/byt5-large-wmt14-deen

Collection including leukas/byt5-large-wmt14-deen

Are Character-level Translations Worth the Wait?

Collection

Collection of trained models for the paper: Are Character-level Translations Worth the Wait? • 162 items • Updated Sep 11

Evaluation results

BLEU on wmt14
test set verified

0.236
loss on wmt14
test set verified

0.301
gen_len on wmt14
test set self-reported

20.000

View on Papers With Code