projecte-aina
/

Plume256k

@@ -37,7 +37,7 @@ This is the model card of Plume (**P**arallel **L**ang**u**age **M**od**e**l) wi
 ## Summary
-Plume is the first LLM trained for Neural Machine Translation with only parallel Catalan-Centric data from scratch. It is a language model with the same architecture as Gemma 2B. The model is trained for general translation tasks at sentence level. For more information about training, architecture and interpretability of the model check out the paper;  "Investigating the translation capabilities of Large Language Models trained on parallel data only". The preprint is available on [arXiv]().
 - **Developed by:** The Language Technologies Unit from Barcelona Supercomputing Center (BSC).
 - **Languages:** Spanish, French, Italian, Portuguese, Galician, German, English, and Basque.
@@ -47,7 +47,7 @@ Plume is the first LLM trained for Neural Machine Translation with only parallel
 In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on  Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
-For more details regarding the model architecture, the dataset and model interpretability take a look at the paper which is available on [arXiv](https://arxiv.org/abs/2406.09140).
 ## Intended Uses and Limitations
@@ -96,11 +96,11 @@ For training, the learning rate is warmed up from 1e-7 to a maximum of 3e-4 over
 | Warmup Steps        | 2000                     |
-More training details are specified in the [paper](). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
 ## Evaluation
-Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper]().
 | Model  | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
 |----------------------|-------------|--------------|------------|-------------|

 ## Summary
+Plume is the first LLM trained for Neural Machine Translation with only parallel Catalan-Centric data from scratch. It is a language model with the same architecture as Gemma 2B. The model is trained for general translation tasks at sentence level. For more information about training, architecture and interpretability of the model check out the paper;  "Investigating the translation capabilities of Large Language Models trained on parallel data only". The preprint is available on [arXiv](https://arxiv.org/abs/2406.09140).
 - **Developed by:** The Language Technologies Unit from Barcelona Supercomputing Center (BSC).
 - **Languages:** Spanish, French, Italian, Portuguese, Galician, German, English, and Basque.
 In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on  Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
+For more details regarding the model architecture, the dataset and model interpretability take a look at the [paper](https://arxiv.org/abs/2406.09140).
 ## Intended Uses and Limitations
 | Warmup Steps        | 2000                     |
+More training details are specified in the [paper](https://arxiv.org/abs/2406.09140). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
 ## Evaluation
+Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper](https://arxiv.org/abs/2406.09140).
 | Model  | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
 |----------------------|-------------|--------------|------------|-------------|