lvwerra HF staff commited on
Commit
5b46ada
1 Parent(s): 468ed0c

Update Space (evaluate main: 08eb01a4)

Browse files
Files changed (4) hide show
  1. README.md +140 -6
  2. app.py +6 -0
  3. requirements.txt +4 -0
  4. trec_eval.py +139 -0
README.md CHANGED
@@ -1,12 +1,146 @@
1
  ---
2
- title: Trec_eval
3
- emoji: 🚀
4
- colorFrom: indigo
5
- colorTo: red
 
 
6
  sdk: gradio
7
- sdk_version: 3.0.9
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: TREC Eval
3
+ datasets:
4
+ -
5
+ tags:
6
+ - evaluate
7
+ - metric
8
  sdk: gradio
9
+ sdk_version: 3.0.2
10
  app_file: app.py
11
  pinned: false
12
  ---
13
 
14
+ # Metric Card for TREC Eval
15
+
16
+ ## Metric Description
17
+
18
+ The TREC Eval metric combines a number of information retrieval metrics such as precision and normalized Discounted Cumulative Gain (nDCG). It is used to score rankings of retrieved documents with reference values.
19
+
20
+ ## How to Use
21
+ ```Python
22
+ from evaluate import load
23
+ trec_eval = load("trec_eval")
24
+ results = trec_eval.compute(predictions=[run], references=[qrel])
25
+ ```
26
+
27
+ ### Inputs
28
+ - **predictions** *(dict): a single retrieval run.*
29
+ - **query** *(int): Query ID.*
30
+ - **q0** *(str): Literal `"q0"`.*
31
+ - **docid** *(str): Document ID.*
32
+ - **rank** *(int): Rank of document.*
33
+ - **score** *(float): Score of document.*
34
+ - **system** *(str): Tag for current run.*
35
+ - **references** *(dict): a single qrel.*
36
+ - **query** *(int): Query ID.*
37
+ - **q0** *(str): Literal `"q0"`.*
38
+ - **docid** *(str): Document ID.*
39
+ - **rel** *(int): Relevance of document.*
40
+
41
+ ### Output Values
42
+ - **runid** *(str): Run name.*
43
+ - **num_ret** *(int): Number of retrieved documents.*
44
+ - **num_rel** *(int): Number of relevant documents.*
45
+ - **num_rel_ret** *(int): Number of retrieved relevant documents.*
46
+ - **num_q** *(int): Number of queries.*
47
+ - **map** *(float): Mean average precision.*
48
+ - **gm_map** *(float): geometric mean average precision.*
49
+ - **bpref** *(float): binary preference score.*
50
+ - **Rprec** *(float): precision@R, where R is number of relevant documents.*
51
+ - **recip_rank** *(float): reciprocal rank*
52
+ - **P@k** *(float): precision@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).*
53
+ - **NDCG@k** *(float): nDCG@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).*
54
+
55
+ ### Examples
56
+
57
+ A minimal example of looks as follows:
58
+ ```Python
59
+ qrel = {
60
+ "query": [0],
61
+ "q0": ["q0"],
62
+ "docid": ["doc_1"],
63
+ "rel": [2]
64
+ }
65
+ run = {
66
+ "query": [0, 0],
67
+ "q0": ["q0", "q0"],
68
+ "docid": ["doc_2", "doc_1"],
69
+ "rank": [0, 1],
70
+ "score": [1.5, 1.2],
71
+ "system": ["test", "test"]
72
+ }
73
+
74
+ trec_eval = evaluate.load("trec_eval")
75
+ results = trec_eval.compute(references=[qrel], predictions=[run])
76
+ results["P@5"]
77
+ 0.2
78
+ ```
79
+
80
+ A more realistic use case with an examples from [`trectools`](https://github.com/joaopalotti/trectools):
81
+
82
+ ```python
83
+ qrel = pd.read_csv("robust03_qrels.txt", sep="\s+", names=["query", "q0", "docid", "rel"])
84
+ qrel["q0"] = qrel["q0"].astype(str)
85
+ qrel = qrel.to_dict(orient="list")
86
+
87
+ run = pd.read_csv("input.InexpC2", sep="\s+", names=["query", "q0", "docid", "rank", "score", "system"])
88
+ run = run.to_dict(orient="list")
89
+
90
+ trec_eval = evaluate.load("trec_eval")
91
+ result = trec_eval.compute(run=[run], qrel=[qrel])
92
+ ```
93
+
94
+ ```python
95
+ result
96
+
97
+ {'runid': 'InexpC2',
98
+ 'num_ret': 100000,
99
+ 'num_rel': 6074,
100
+ 'num_rel_ret': 3198,
101
+ 'num_q': 100,
102
+ 'map': 0.22485930431817494,
103
+ 'gm_map': 0.10411523825735523,
104
+ 'bpref': 0.217511695914079,
105
+ 'Rprec': 0.2502547201167236,
106
+ 'recip_rank': 0.6646545943335417,
107
+ 'P@5': 0.44,
108
+ 'P@10': 0.37,
109
+ 'P@15': 0.34600000000000003,
110
+ 'P@20': 0.30999999999999994,
111
+ 'P@30': 0.2563333333333333,
112
+ 'P@100': 0.1428,
113
+ 'P@200': 0.09510000000000002,
114
+ 'P@500': 0.05242,
115
+ 'P@1000': 0.03198,
116
+ 'NDCG@5': 0.4101480395089769,
117
+ 'NDCG@10': 0.3806761417784469,
118
+ 'NDCG@15': 0.37819463408955706,
119
+ 'NDCG@20': 0.3686080836061317,
120
+ 'NDCG@30': 0.352474353427451,
121
+ 'NDCG@100': 0.3778329431025776,
122
+ 'NDCG@200': 0.4119129817248979,
123
+ 'NDCG@500': 0.4585354576461375,
124
+ 'NDCG@1000': 0.49092149290805653}
125
+ ```
126
+
127
+ ## Limitations and Bias
128
+ The `trec_eval` metric requires the inputs to be in the TREC run and qrel formats for predictions and references.
129
+
130
+
131
+ ## Citation
132
+
133
+ ```bibtex
134
+ @inproceedings{palotti2019,
135
+ author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido},
136
+ title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns},
137
+ series = {SIGIR'19},
138
+ year = {2019},
139
+ location = {Paris, France},
140
+ publisher = {ACM}
141
+ }
142
+ ```
143
+
144
+ ## Further References
145
+
146
+ - Homepage: https://github.com/joaopalotti/trectools
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("lvwerra/trec_eval")
6
+ launch_gradio_widget(module)
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # TODO: fix github to release
2
+ git+https://github.com/huggingface/evaluate.git@main
3
+ datasets~=2.0
4
+ trectools
trec_eval.py ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """Module to compute TREC evaluation scores."""
15
+
16
+ import datasets
17
+ import pandas as pd
18
+ from trectools import TrecEval, TrecQrel, TrecRun
19
+
20
+ import evaluate
21
+
22
+
23
+ _CITATION = """\
24
+ @inproceedings{palotti2019,
25
+ author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido},
26
+ title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns},
27
+ series = {SIGIR'19},
28
+ year = {2019},
29
+ location = {Paris, France},
30
+ publisher = {ACM}
31
+ }
32
+ """
33
+
34
+ # TODO: Add description of the module here
35
+ _DESCRIPTION = """\
36
+ The TREC Eval metric combines a number of information retrieval metrics such as \
37
+ precision and nDCG. It is used to score rankings of retrieved documents with reference values."""
38
+
39
+
40
+ # TODO: Add description of the arguments of the module here
41
+ _KWARGS_DESCRIPTION = """
42
+ Calculates TREC evaluation scores based on a run and qrel.
43
+ Args:
44
+ predictions: list containing a single run.
45
+ references: list containing a single qrel.
46
+ Returns:
47
+ dict: TREC evaluation scores.
48
+ Examples:
49
+ >>> trec = evaluate.load("trec_eval")
50
+ >>> qrel = {
51
+ ... "query": [0],
52
+ ... "q0": ["0"],
53
+ ... "docid": ["doc_1"],
54
+ ... "rel": [2]
55
+ ... }
56
+ >>> run = {
57
+ ... "query": [0, 0],
58
+ ... "q0": ["q0", "q0"],
59
+ ... "docid": ["doc_2", "doc_1"],
60
+ ... "rank": [0, 1],
61
+ ... "score": [1.5, 1.2],
62
+ ... "system": ["test", "test"]
63
+ ... }
64
+ >>> results = trec.compute(references=[qrel], predictions=[run])
65
+ >>> print(results["P@5"])
66
+ 0.2
67
+ """
68
+
69
+
70
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
71
+ class TRECEval(evaluate.EvaluationModule):
72
+ """Compute TREC evaluation scores."""
73
+
74
+ def _info(self):
75
+ return evaluate.EvaluationModuleInfo(
76
+ module_type="metric",
77
+ description=_DESCRIPTION,
78
+ citation=_CITATION,
79
+ inputs_description=_KWARGS_DESCRIPTION,
80
+ features=datasets.Features(
81
+ {
82
+ "predictions": {
83
+ "query": datasets.Sequence(datasets.Value("int64")),
84
+ "q0": datasets.Sequence(datasets.Value("string")),
85
+ "docid": datasets.Sequence(datasets.Value("string")),
86
+ "rank": datasets.Sequence(datasets.Value("int64")),
87
+ "score": datasets.Sequence(datasets.Value("float")),
88
+ "system": datasets.Sequence(datasets.Value("string")),
89
+ },
90
+ "references": {
91
+ "query": datasets.Sequence(datasets.Value("int64")),
92
+ "q0": datasets.Sequence(datasets.Value("string")),
93
+ "docid": datasets.Sequence(datasets.Value("string")),
94
+ "rel": datasets.Sequence(datasets.Value("int64")),
95
+ },
96
+ }
97
+ ),
98
+ homepage="https://github.com/joaopalotti/trectools",
99
+ )
100
+
101
+ def _compute(self, references, predictions):
102
+ """Returns the TREC evaluation scores."""
103
+
104
+ if len(predictions) > 1 or len(references) > 1:
105
+ raise ValueError(
106
+ f"You can only pass one prediction and reference per evaluation. You passed {len(predictions)} prediction(s) and {len(references)} reference(s)."
107
+ )
108
+
109
+ df_run = pd.DataFrame(predictions[0])
110
+ df_qrel = pd.DataFrame(references[0])
111
+
112
+ trec_run = TrecRun()
113
+ trec_run.filename = "placeholder.file"
114
+ trec_run.run_data = df_run
115
+
116
+ trec_qrel = TrecQrel()
117
+ trec_qrel.filename = "placeholder.file"
118
+ trec_qrel.qrels_data = df_qrel
119
+
120
+ trec_eval = TrecEval(trec_run, trec_qrel)
121
+
122
+ result = {}
123
+ result["runid"] = trec_eval.run.get_runid()
124
+ result["num_ret"] = trec_eval.get_retrieved_documents(per_query=False)
125
+ result["num_rel"] = trec_eval.get_relevant_documents(per_query=False)
126
+ result["num_rel_ret"] = trec_eval.get_relevant_retrieved_documents(per_query=False)
127
+ result["num_q"] = len(trec_eval.run.topics())
128
+ result["map"] = trec_eval.get_map(depth=10000, per_query=False, trec_eval=True)
129
+ result["gm_map"] = trec_eval.get_geometric_map(depth=10000, trec_eval=True)
130
+ result["bpref"] = trec_eval.get_bpref(depth=1000, per_query=False, trec_eval=True)
131
+ result["Rprec"] = trec_eval.get_rprec(depth=1000, per_query=False, trec_eval=True)
132
+ result["recip_rank"] = trec_eval.get_reciprocal_rank(depth=1000, per_query=False, trec_eval=True)
133
+
134
+ for v in [5, 10, 15, 20, 30, 100, 200, 500, 1000]:
135
+ result[f"P@{v}"] = trec_eval.get_precision(depth=v, per_query=False, trec_eval=True)
136
+ for v in [5, 10, 15, 20, 30, 100, 200, 500, 1000]:
137
+ result[f"NDCG@{v}"] = trec_eval.get_ndcg(depth=v, per_query=False, trec_eval=True)
138
+
139
+ return result