billingsmoore
commited on
Commit
•
a89f493
1
Parent(s):
85ab26c
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ The model's name is taken from 'machine learning' (ML) and translation (Tibetan:
|
|
11 |
|
12 |
The model expects Tibetan transliterated according to THL Simplified Phonetic Transliteration as an input and outputs an English translation.
|
13 |
|
14 |
-
The model was evaluated using the BLEU metric, with a final score of 83.4374 on evaluation data.
|
15 |
However, this score is unusually high, and may be the result of testing error. Stricter evaluation
|
16 |
and training are currently in progress.
|
17 |
|
@@ -246,7 +246,8 @@ This model was trained for 6 epochs on the dataset described above.
|
|
246 |
|
247 |
## Evaluation
|
248 |
|
249 |
-
The evaluation metric for this model was the BLEU score
|
|
|
250 |
machine-generated translations by comparing them to human-provided reference translations. The score ranges from 0 to 100,
|
251 |
where 100 represents a perfect match with the reference translations. It evaluates the precision of n-grams (word sequences)
|
252 |
in the generated text, with higher scores indicating closer alignment to the reference translations. A brevity penalty is applied
|
|
|
11 |
|
12 |
The model expects Tibetan transliterated according to THL Simplified Phonetic Transliteration as an input and outputs an English translation.
|
13 |
|
14 |
+
The model was evaluated using the BLEU metric as implemented by [sacreBLEU](https://pypi.org/project/sacrebleu/), with a final score of 83.4374 on evaluation data.
|
15 |
However, this score is unusually high, and may be the result of testing error. Stricter evaluation
|
16 |
and training are currently in progress.
|
17 |
|
|
|
246 |
|
247 |
## Evaluation
|
248 |
|
249 |
+
The evaluation metric for this model was the BLEU score as implemented by [sacreBLEU](https://pypi.org/project/sacrebleu/).
|
250 |
+
BLEU (Bilingual Evaluation Understudy) scores measure the quality of
|
251 |
machine-generated translations by comparing them to human-provided reference translations. The score ranges from 0 to 100,
|
252 |
where 100 represents a perfect match with the reference translations. It evaluates the precision of n-grams (word sequences)
|
253 |
in the generated text, with higher scores indicating closer alignment to the reference translations. A brevity penalty is applied
|