MaLA-LM
/

emma-500-llama2-7b

Model card Files Files and versions Community

jisx commited on 11 days ago

Commit

b87ec2a

•

1 Parent(s): 6e295ea

Update README.md

Files changed (1) hide show

README.md +16 -9

README.md CHANGED Viewed

@@ -1,11 +1,11 @@
----
-license: llama2
-datasets:
-- MaLA-LM/PolyWrite
-- Davlan/sib200
-base_model:
-- meta-llama/Llama-2-7b-hf
----
 # EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
@@ -61,4 +61,11 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 Challenges remain in low-resource languages, where the model tends to have higher **Self-BLEU** scores, indicating reduced output diversity.
----

+---
+license: llama2
+datasets:
+- MaLA-LM/PolyWrite
+- Davlan/sib200
+base_model:
+- meta-llama/Llama-2-7b-hf
+---
 # EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
 Challenges remain in low-resource languages, where the model tends to have higher **Self-BLEU** scores, indicating reduced output diversity.
+---
+## Acknowledgements
+We extend our thanks to the language communities and contributors who helped source, clean, and validate the diverse data used in the MaLA Corpus. Their efforts are invaluable in supporting linguistic diversity in AI research.
+This work is created by researchers at [Helsinki-NLP](https://huggingface.co/Helsinki-NLP) in collaboration with partners from TU Darmstadt, the University of Edinburgh, and LMU Munich. It is funded by [HPLT](https://hplt-project.org) and [UTTER](https://he-utter.eu).