opus-mt-eng-deu / README.md
teelinsan's picture
Add multilingual to the language tag (#1)
6ec5e83
metadata
language:
  - en
  - de
  - multilingual
license: cc-by-4.0
tags:
  - translation
  - opus-mt
model-index:
  - name: opus-mt-eng-deu
    results:
      - task:
          type: translation
          name: Translation eng-deu
        dataset:
          name: Tatoeba-test.eng-deu
          type: tatoeba_mt
          args: eng-deu
        metrics:
          - type: bleu
            value: 45.8
            name: BLEU

Opus Tatoeba English-German

*This model was obtained by running the script convert_marian_to_pytorch.py - Instruction available here. The original models were trained by J�rg Tiedemann using the MarianNMT library. See all available MarianMTModel models on the profile of the Helsinki NLP group.

This is the conversion of checkpoint opus-2021-02-22.zip *


eng-deu

  • source language name: English

  • target language name: German

  • OPUS readme: README.md

  • model: transformer

  • source language code: en

  • target language code: de

  • dataset: opus

  • release date: 2021-02-22

  • pre-processing: normalization + SentencePiece (spm32k,spm32k)

  • download original weights: opus-2021-02-22.zip

  • Training data:

    • deu-eng: Tatoeba-train (86845165)
  • Validation data:

    • deu-eng: Tatoeba-dev, 284809
    • total-size-shuffled: 284809
    • devset-selected: top 5000 lines of Tatoeba-dev.src.shuffled!
  • Test data:

    • newssyscomb2009.eng-deu: 502/11271
    • news-test2008.eng-deu: 2051/47427
    • newstest2009.eng-deu: 2525/62816
    • newstest2010.eng-deu: 2489/61511
    • newstest2011.eng-deu: 3003/72981
    • newstest2012.eng-deu: 3003/72886
    • newstest2013.eng-deu: 3000/63737
    • newstest2014-deen.eng-deu: 3003/62964
    • newstest2015-ende.eng-deu: 2169/44260
    • newstest2016-ende.eng-deu: 2999/62670
    • newstest2017-ende.eng-deu: 3004/61291
    • newstest2018-ende.eng-deu: 2998/64276
    • newstest2019-ende.eng-deu: 1997/48969
    • Tatoeba-test.eng-deu: 10000/83347
  • test set translations file: test.txt

  • test set scores file: eval.txt

  • BLEU-scores

    Test set score
    newstest2018-ende.eng-deu 46.4
    Tatoeba-test.eng-deu 45.8
    newstest2019-ende.eng-deu 42.4
    newstest2016-ende.eng-deu 37.9
    newstest2015-ende.eng-deu 32.0
    newstest2017-ende.eng-deu 30.6
    newstest2014-deen.eng-deu 29.6
    newstest2013.eng-deu 27.6
    newstest2010.eng-deu 25.9
    news-test2008.eng-deu 23.9
    newstest2012.eng-deu 23.8
    newssyscomb2009.eng-deu 23.3
    newstest2011.eng-deu 22.9
    newstest2009.eng-deu 22.7
  • chr-F-scores

    Test set score
    newstest2018-ende.eng-deu 0.697
    newstest2019-ende.eng-deu 0.664
    Tatoeba-test.eng-deu 0.655
    newstest2016-ende.eng-deu 0.644
    newstest2015-ende.eng-deu 0.601
    newstest2014-deen.eng-deu 0.595
    newstest2017-ende.eng-deu 0.593
    newstest2013.eng-deu 0.558
    newstest2010.eng-deu 0.55
    newssyscomb2009.eng-deu 0.539
    news-test2008.eng-deu 0.533
    newstest2009.eng-deu 0.533
    newstest2012.eng-deu 0.53
    newstest2011.eng-deu 0.528