Update README.md
Browse files
README.md
CHANGED
@@ -24,17 +24,51 @@ It achieves the following results on the evaluation set:
|
|
24 |
|
25 |
## Model description
|
26 |
|
27 |
-
|
|
|
|
|
|
|
28 |
|
29 |
## Intended uses & limitations
|
30 |
|
31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
## Training and evaluation data
|
34 |
|
35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
## Training procedure
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
### Training hyperparameters
|
40 |
|
|
|
24 |
|
25 |
## Model description
|
26 |
|
27 |
+
- Source: Text (spoken text)
|
28 |
+
- Target: gloss (ArSL gloss)
|
29 |
+
- Domain: ArSL Friday sermon translation from text to gloss
|
30 |
+
We used a pre-trained model (apus_mt) for domain specification.
|
31 |
|
32 |
## Intended uses & limitations
|
33 |
|
34 |
+
- Data Specificity: The model is trained specifically on Arabic text and ArSL glosses. It may not perform well when applied to other languages or sign languages.
|
35 |
+
|
36 |
+
- Contextual Accuracy: While the model handles straightforward translations effectively, it might struggle with complex sentences or phrases that require a deep understanding of context, especially when combining or shuffling sentences.
|
37 |
+
|
38 |
+
- Generalization to Unseen Data: The model’s performance may degrade when exposed to text that significantly differs in style or content from the training data, such as highly specialized jargon or informal language.
|
39 |
+
|
40 |
+
- Gloss Representation: The model translates text into glosses, which are a written representation of sign language but do not capture the full complexity of sign language grammar and non-manual signals (facial expressions, body language).
|
41 |
+
|
42 |
+
- Test Dataset Limitations: The test dataset used is a shortened version of a sermon that does not cover all possible sentence structures and contexts, which may limit the model’s ability to generalize to other domains.
|
43 |
+
|
44 |
+
- Ethical Considerations: Care must be taken when deploying this model in real-world applications, as misinterpretations or inaccuracies in translation can lead to misunderstandings, especially in sensitive communications.
|
45 |
+
|
46 |
|
47 |
## Training and evaluation data
|
48 |
|
49 |
+
- Dataset size before augmentation: 131
|
50 |
+
- Dataset size after augmentation: 8646
|
51 |
+
- (For training and validation): Augmented Dataset Splitter:
|
52 |
+
- train: 7349
|
53 |
+
- validation: 1297
|
54 |
+
- (For testing): We used a dataset that contained the actual scenario of the Friday sermon phrases to generate a short Friday sermon.
|
55 |
+
|
56 |
|
57 |
## Training procedure
|
58 |
+
## 1- Train and Evaluation Result:
|
59 |
+
- Train and Evaluation Loss: 0.464023
|
60 |
+
- Train and Evaluation Word BLEU Score: 97.08
|
61 |
+
- Train and Evaluation Char BLEU Score: 98.94
|
62 |
+
- Train and Evaluation Runtime (seconds): 562.8277
|
63 |
+
- Train and Evaluation Samples per Second: 391.718
|
64 |
+
- Train and Evaluation Steps per Second: 12.26
|
65 |
+
- Test Results:
|
66 |
+
## 2- Test Loss: 0.289312
|
67 |
+
- Test Word BLEU Score: 76.92
|
68 |
+
- Test Char BLEU Score: 86.30
|
69 |
+
- Test Runtime (seconds): 1.1038
|
70 |
+
- Test Samples per Second: 41.67
|
71 |
+
- Test Steps per Second: 0.91
|
72 |
|
73 |
### Training hyperparameters
|
74 |
|