mfajcik commited on
Commit
c980420
·
1 Parent(s): 43aa8ea

Update content.py

Browse files
Files changed (1) hide show
  1. content.py +1 -1
content.py CHANGED
@@ -90,7 +90,7 @@ We use the following metrics for following tasks:
90
  On every task, for every metric we compute test for statistical significance at α=0.05, i.e., the probability that performance model A is equal to the performance model B is estimated to be less then 0.05.
91
  We use the following tests, with varying statistical power:
92
  - accuracy and exact-match: one-tailed paired t-test,
93
- - average area under the curve: bayesian test inspired with ((Goutte et al., 2005)[https://link.springer.com/chapter/10.1007/978-3-540-31865-1_25]),
94
  - summarization & perplexity: bootstrapping.
95
 
96
  ### Duel Scoring Mechanism, Win Score
 
90
  On every task, for every metric we compute test for statistical significance at α=0.05, i.e., the probability that performance model A is equal to the performance model B is estimated to be less then 0.05.
91
  We use the following tests, with varying statistical power:
92
  - accuracy and exact-match: one-tailed paired t-test,
93
+ - average area under the curve: bayesian test inspired with (Goutte et al., 2005)[https://link.springer.com/chapter/10.1007/978-3-540-31865-1_25],
94
  - summarization & perplexity: bootstrapping.
95
 
96
  ### Duel Scoring Mechanism, Win Score