julien-c HF staff commited on
Commit
6423e29
1 Parent(s): f4c04ef

Add description to card metadata

Browse files

The BLEU score has some undesirable properties when used for single
sentences, as it was designed to be a corpus measure. We therefore
use a slightly different score for our RL experiments which we call
the 'GLEU score'. For the GLEU score, we record all sub-sequences of
1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then
compute a recall, which is the ratio of the number of matching n-grams
to the number of total n-grams in the target (ground truth) sequence,
and a precision, which is the ratio of the number of matching n-grams
to the number of total n-grams in the generated output sequence. Then
GLEU score is simply the minimum of recall and precision. This GLEU
score's range is always between 0 (no matches) and 1 (all match) and
it is symmetrical when switching output and target. According to
our experiments, GLEU score correlates quite well with the BLEU
metric on a corpus level but does not have its drawbacks for our per
sentence reward objective.

Files changed (1) hide show
  1. README.md +19 -4
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: Google BLEU
3
- emoji: 🤗
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
@@ -8,10 +8,25 @@ sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
  tags:
11
- - evaluate
12
- - metric
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
-
15
  # Metric Card for Google BLEU
16
 
17
 
 
1
  ---
2
  title: Google BLEU
3
+ emoji: 🤗
4
  colorFrom: blue
5
  colorTo: red
6
  sdk: gradio
 
8
  app_file: app.py
9
  pinned: false
10
  tags:
11
+ - evaluate
12
+ - metric
13
+ description: |-
14
+ The BLEU score has some undesirable properties when used for single
15
+ sentences, as it was designed to be a corpus measure. We therefore
16
+ use a slightly different score for our RL experiments which we call
17
+ the 'GLEU score'. For the GLEU score, we record all sub-sequences of
18
+ 1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then
19
+ compute a recall, which is the ratio of the number of matching n-grams
20
+ to the number of total n-grams in the target (ground truth) sequence,
21
+ and a precision, which is the ratio of the number of matching n-grams
22
+ to the number of total n-grams in the generated output sequence. Then
23
+ GLEU score is simply the minimum of recall and precision. This GLEU
24
+ score's range is always between 0 (no matches) and 1 (all match) and
25
+ it is symmetrical when switching output and target. According to
26
+ our experiments, GLEU score correlates quite well with the BLEU
27
+ metric on a corpus level but does not have its drawbacks for our per
28
+ sentence reward objective.
29
  ---
 
30
  # Metric Card for Google BLEU
31
 
32