Spaces:
Running
Add description to card metadata
Browse filesThe BLEU score has some undesirable properties when used for single
sentences, as it was designed to be a corpus measure. We therefore
use a slightly different score for our RL experiments which we call
the 'GLEU score'. For the GLEU score, we record all sub-sequences of
1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then
compute a recall, which is the ratio of the number of matching n-grams
to the number of total n-grams in the target (ground truth) sequence,
and a precision, which is the ratio of the number of matching n-grams
to the number of total n-grams in the generated output sequence. Then
GLEU score is simply the minimum of recall and precision. This GLEU
score's range is always between 0 (no matches) and 1 (all match) and
it is symmetrical when switching output and target. According to
our experiments, GLEU score correlates quite well with the BLEU
metric on a corpus level but does not have its drawbacks for our per
sentence reward objective.
@@ -1,6 +1,6 @@
|
|
1 |
---
|
2 |
title: Google BLEU
|
3 |
-
emoji: 🤗
|
4 |
colorFrom: blue
|
5 |
colorTo: red
|
6 |
sdk: gradio
|
@@ -8,10 +8,25 @@ sdk_version: 3.0.2
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
tags:
|
11 |
-
- evaluate
|
12 |
-
- metric
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
---
|
14 |
-
|
15 |
# Metric Card for Google BLEU
|
16 |
|
17 |
|
|
|
1 |
---
|
2 |
title: Google BLEU
|
3 |
+
emoji: 🤗
|
4 |
colorFrom: blue
|
5 |
colorTo: red
|
6 |
sdk: gradio
|
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
tags:
|
11 |
+
- evaluate
|
12 |
+
- metric
|
13 |
+
description: |-
|
14 |
+
The BLEU score has some undesirable properties when used for single
|
15 |
+
sentences, as it was designed to be a corpus measure. We therefore
|
16 |
+
use a slightly different score for our RL experiments which we call
|
17 |
+
the 'GLEU score'. For the GLEU score, we record all sub-sequences of
|
18 |
+
1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then
|
19 |
+
compute a recall, which is the ratio of the number of matching n-grams
|
20 |
+
to the number of total n-grams in the target (ground truth) sequence,
|
21 |
+
and a precision, which is the ratio of the number of matching n-grams
|
22 |
+
to the number of total n-grams in the generated output sequence. Then
|
23 |
+
GLEU score is simply the minimum of recall and precision. This GLEU
|
24 |
+
score's range is always between 0 (no matches) and 1 (all match) and
|
25 |
+
it is symmetrical when switching output and target. According to
|
26 |
+
our experiments, GLEU score correlates quite well with the BLEU
|
27 |
+
metric on a corpus level but does not have its drawbacks for our per
|
28 |
+
sentence reward objective.
|
29 |
---
|
|
|
30 |
# Metric Card for Google BLEU
|
31 |
|
32 |
|