amurienne
/

kellag-m2m100

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

amurienne commited on 11 days ago

Commit

7711cbc

·

verified ·

1 Parent(s): 9d2905a

Update README.md

Files changed (1) hide show

README.md +41 -3

README.md CHANGED Viewed

@@ -1,3 +1,41 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- Bretagne/ofis_publik_br-fr
+- Bretagne/OpenSubtitles_br_fr
+- Bretagne/Autogramm_Breton_translation
+language:
+- fr
+- br
+base_model:
+- facebook/m2m100_418M
+pipeline_tag: translation
+library_name: transformers
+---
+# Kellag
+* A Breton -> French Translation Model called **Kellag**.
+* Kellag is the temporary "brother" model of [Gallek](https://huggingface.co/amurienne/gallek-m2m100), since a bidirectional fr <-> br model is not ready yet (WIP).
+* The current model version reached a **BLEU score of 50** after 10 epochs on a 20% split of the training set.
+* Only monodirectionally br->fr fine-tuned for now.
+* Training details available on the [GweLLM Github repository](https://github.com/blackccpie/GweLLM).
+Sample test code:
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
+modelcard = "amurienne/kellag-m2m100"
+model = AutoModelForSeq2SeqLM.from_pretrained(modelcard)
+tokenizer = AutoTokenizer.from_pretrained(modelcard)
+translation_pipeline = pipeline("translation", model=model, tokenizer=tokenizer, src_lang='br', tgt_lang='fr', max_length=512, device="cpu")
+breton_text = "treiñ eus ar brezhoneg d'ar galleg: deskiñ a ran brezhoneg er skol."
+result = translation_pipeline(breton_text)
+print(result[0]['translation_text'])
+```
+Demo is available on the [Gallek Space](https://huggingface.co/spaces/amurienne/Gallek)