BounharAbdelaziz commited on
Commit
cba44cb
1 Parent(s): 02cb712

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -15
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
2
  license: cc-by-nc-4.0
3
  base_model: Helsinki-NLP/opus-mt-tc-big-en-ar
 
 
4
  model-index:
5
  - name: Terjman-Large-v2
6
  results: []
@@ -8,17 +10,16 @@ datasets:
8
  - atlasia/darija_english
9
  language:
10
  - ar
 
11
  ---
12
 
13
- # Transliteration-Moroccan-Darija
14
-
15
- This model is trained to translate English text (en) into Moroccan Darija text (Ary) written in Arabic letters.
16
-
17
- ## Model Overview
18
 
19
  Our model is built upon the powerful Transformer architecture, leveraging state-of-the-art natural language processing techniques.
20
- It has been finetuned on a the "atlasia/darija_english" dataset enhanced with curated corpora ensuring high-quality and accurate translations.
 
21
 
 
22
 
23
  ## Training hyperparameters
24
 
@@ -34,13 +35,6 @@ The following hyperparameters were used during training:
34
  - lr_scheduler_warmup_ratio: 0.03
35
  - num_epochs: 30
36
 
37
- ## Framework versions
38
-
39
- - Transformers 4.39.2
40
- - Pytorch 2.2.2+cpu
41
- - Datasets 2.18.0
42
- - Tokenizers 0.15.2
43
-
44
  ## Usage
45
 
46
  Using our model for translation is simple and straightforward.
@@ -66,7 +60,7 @@ output_tokens = model.generate(**input_tokens)
66
  # Decode the output tokens
67
  output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
68
 
69
- print("Transliteration:", output_text)
70
  ```
71
 
72
  ## Example
@@ -86,4 +80,11 @@ We're currently collecting more data with the aim of continous improvements.
86
  ## Feedback
87
 
88
  We're continuously striving to improve our model's performance and usability and we will be improving it incrementaly.
89
- If you have any feedback, suggestions, or encounter any issues, please don't hesitate to reach out to us.
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-4.0
3
  base_model: Helsinki-NLP/opus-mt-tc-big-en-ar
4
+ metrics:
5
+ - bleu
6
  model-index:
7
  - name: Terjman-Large-v2
8
  results: []
 
10
  - atlasia/darija_english
11
  language:
12
  - ar
13
+ - en
14
  ---
15
 
16
+ # Terjman-Large-v2 (240M params)
 
 
 
 
17
 
18
  Our model is built upon the powerful Transformer architecture, leveraging state-of-the-art natural language processing techniques.
19
+ It has been finetuned on a the [darija_english](atlasia/darija_english) dataset enhanced with curated corpora ensuring high-quality and accurate translations.
20
+ This model is an impovement of the previous version [Terjman-Large](atlasia/Terjman-Large).
21
 
22
+ The finetuning was conducted using a **A100-40GB** and took **17 hours**.
23
 
24
  ## Training hyperparameters
25
 
 
35
  - lr_scheduler_warmup_ratio: 0.03
36
  - num_epochs: 30
37
 
 
 
 
 
 
 
 
38
  ## Usage
39
 
40
  Using our model for translation is simple and straightforward.
 
60
  # Decode the output tokens
61
  output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
62
 
63
+ print("Translation:", output_text)
64
  ```
65
 
66
  ## Example
 
80
  ## Feedback
81
 
82
  We're continuously striving to improve our model's performance and usability and we will be improving it incrementaly.
83
+ If you have any feedback, suggestions, or encounter any issues, please don't hesitate to reach out to us.
84
+
85
+ ## Framework versions
86
+
87
+ - Transformers 4.39.2
88
+ - Pytorch 2.2.2+cpu
89
+ - Datasets 2.18.0
90
+ - Tokenizers 0.15.2