Spaces:

evaluate-metric
/

trec_eval

Running

App Files Files Community

lvwerra HF staff commited on May 30, 2022

Commit

5b46ada

•

1 Parent(s): 468ed0c

Update Space (evaluate main: 08eb01a4)

Browse files

Files changed (4) hide show

README.md +140 -6
app.py +6 -0
requirements.txt +4 -0
trec_eval.py +139 -0

README.md CHANGED Viewed

@@ -1,12 +1,146 @@
 ---
-title: Trec_eval
-emoji: 🚀
-colorFrom: indigo
-colorTo: red
 sdk: gradio
-sdk_version: 3.0.9
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference

 ---
+title: TREC Eval
+datasets:
+-
+tags:
+- evaluate
+- metric
 sdk: gradio
+sdk_version: 3.0.2
 app_file: app.py
 pinned: false
 ---
+# Metric Card for TREC Eval
+## Metric Description
+The TREC Eval metric combines a number of information retrieval metrics such as precision and normalized Discounted Cumulative Gain (nDCG). It is used to score rankings of retrieved documents with reference values.
+## How to Use
+```Python
+from evaluate import load
+trec_eval = load("trec_eval")
+results = trec_eval.compute(predictions=[run], references=[qrel])
+```
+### Inputs
+- **predictions** *(dict): a single retrieval run.*
+    - **query** *(int): Query ID.*
+    - **q0** *(str): Literal `"q0"`.*
+    - **docid** *(str): Document ID.*
+    - **rank** *(int): Rank of document.*
+    - **score** *(float): Score of document.*
+    - **system** *(str): Tag for current run.*
+- **references** *(dict): a single qrel.*
+    - **query** *(int): Query ID.*
+    - **q0** *(str): Literal `"q0"`.*
+    - **docid** *(str): Document ID.*
+    - **rel** *(int): Relevance of document.*
+### Output Values
+- **runid** *(str): Run name.*
+- **num_ret** *(int): Number of retrieved documents.*
+- **num_rel** *(int): Number of relevant documents.*
+- **num_rel_ret** *(int): Number of retrieved relevant documents.*
+- **num_q** *(int): Number of queries.*
+- **map** *(float): Mean average precision.*
+- **gm_map** *(float): geometric mean average precision.*
+- **bpref** *(float): binary preference score.*
+- **Rprec** *(float): precision@R, where R is number of relevant documents.*
+- **recip_rank** *(float): reciprocal rank*
+- **P@k** *(float): precision@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).*
+- **NDCG@k** *(float): nDCG@k (k in [5, 10, 15, 20, 30, 100, 200, 500, 1000]).*
+### Examples
+A minimal example of looks as follows:
+```Python
+qrel = {
+    "query": [0],
+    "q0": ["q0"],
+    "docid": ["doc_1"],
+    "rel": [2]
+}
+run = {
+    "query": [0, 0],
+    "q0": ["q0", "q0"],
+    "docid": ["doc_2", "doc_1"],
+    "rank": [0, 1],
+    "score": [1.5, 1.2],
+    "system": ["test", "test"]
+}
+trec_eval = evaluate.load("trec_eval")
+results = trec_eval.compute(references=[qrel], predictions=[run])
+results["P@5"]
+0.2
+```
+A more realistic use case with an examples from [`trectools`](https://github.com/joaopalotti/trectools):
+```python
+qrel = pd.read_csv("robust03_qrels.txt", sep="\s+", names=["query", "q0", "docid", "rel"])
+qrel["q0"] = qrel["q0"].astype(str)
+qrel = qrel.to_dict(orient="list")
+run = pd.read_csv("input.InexpC2", sep="\s+", names=["query", "q0", "docid", "rank", "score", "system"])
+run = run.to_dict(orient="list")
+trec_eval = evaluate.load("trec_eval")
+result = trec_eval.compute(run=[run], qrel=[qrel])
+```
+```python
+result
+{'runid': 'InexpC2',
+ 'num_ret': 100000,
+ 'num_rel': 6074,
+ 'num_rel_ret': 3198,
+ 'num_q': 100,
+ 'map': 0.22485930431817494,
+ 'gm_map': 0.10411523825735523,
+ 'bpref': 0.217511695914079,
+ 'Rprec': 0.2502547201167236,
+ 'recip_rank': 0.6646545943335417,
+ 'P@5': 0.44,
+ 'P@10': 0.37,
+ 'P@15': 0.34600000000000003,
+ 'P@20': 0.30999999999999994,
+ 'P@30': 0.2563333333333333,
+ 'P@100': 0.1428,
+ 'P@200': 0.09510000000000002,
+ 'P@500': 0.05242,
+ 'P@1000': 0.03198,
+ 'NDCG@5': 0.4101480395089769,
+ 'NDCG@10': 0.3806761417784469,
+ 'NDCG@15': 0.37819463408955706,
+ 'NDCG@20': 0.3686080836061317,
+ 'NDCG@30': 0.352474353427451,
+ 'NDCG@100': 0.3778329431025776,
+ 'NDCG@200': 0.4119129817248979,
+ 'NDCG@500': 0.4585354576461375,
+ 'NDCG@1000': 0.49092149290805653}
+```
+## Limitations and Bias
+The `trec_eval` metric requires the inputs to be in the TREC run and qrel formats for predictions and references.
+## Citation
+```bibtex
+@inproceedings{palotti2019,
+ author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido},
+ title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns},
+ series = {SIGIR'19},
+ year = {2019},
+ location = {Paris, France},
+ publisher = {ACM}
+}
+```
+## Further References
+- Homepage: https://github.com/joaopalotti/trectools

app.py ADDED Viewed

	@@ -0,0 +1,6 @@

+import evaluate
+from evaluate.utils import launch_gradio_widget
+module = evaluate.load("lvwerra/trec_eval")
+launch_gradio_widget(module)

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+# TODO: fix github to release
+git+https://github.com/huggingface/evaluate.git@main
+datasets~=2.0
+trectools

trec_eval.py ADDED Viewed

	@@ -0,0 +1,139 @@

+# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Module to compute TREC evaluation scores."""
+import datasets
+import pandas as pd
+from trectools import TrecEval, TrecQrel, TrecRun
+import evaluate
+_CITATION = """\
+@inproceedings{palotti2019,
+ author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido},
+ title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns},
+ series = {SIGIR'19},
+ year = {2019},
+ location = {Paris, France},
+ publisher = {ACM}
+}
+"""
+# TODO: Add description of the module here
+_DESCRIPTION = """\
+The TREC Eval metric combines a number of information retrieval metrics such as \
+precision and nDCG. It is used to score rankings of retrieved documents with reference values."""
+# TODO: Add description of the arguments of the module here
+_KWARGS_DESCRIPTION = """
+Calculates TREC evaluation scores based on a run and qrel.
+Args:
+    predictions: list containing a single run.
+    references: list containing a single qrel.
+Returns:
+    dict: TREC evaluation scores.
+Examples:
+    >>> trec = evaluate.load("trec_eval")
+    >>> qrel = {
+    ...     "query": [0],
+    ...     "q0": ["0"],
+    ...     "docid": ["doc_1"],
+    ...     "rel": [2]
+    ... }
+    >>> run = {
+    ...     "query": [0, 0],
+    ...     "q0": ["q0", "q0"],
+    ...     "docid": ["doc_2", "doc_1"],
+    ...     "rank": [0, 1],
+    ...     "score": [1.5, 1.2],
+    ...     "system": ["test", "test"]
+    ... }
+    >>> results = trec.compute(references=[qrel], predictions=[run])
+    >>> print(results["P@5"])
+    0.2
+"""
+@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
+class TRECEval(evaluate.EvaluationModule):
+    """Compute TREC evaluation scores."""
+    def _info(self):
+        return evaluate.EvaluationModuleInfo(
+            module_type="metric",
+            description=_DESCRIPTION,
+            citation=_CITATION,
+            inputs_description=_KWARGS_DESCRIPTION,
+            features=datasets.Features(
+                {
+                    "predictions": {
+                        "query": datasets.Sequence(datasets.Value("int64")),
+                        "q0": datasets.Sequence(datasets.Value("string")),
+                        "docid": datasets.Sequence(datasets.Value("string")),
+                        "rank": datasets.Sequence(datasets.Value("int64")),
+                        "score": datasets.Sequence(datasets.Value("float")),
+                        "system": datasets.Sequence(datasets.Value("string")),
+                    },
+                    "references": {
+                        "query": datasets.Sequence(datasets.Value("int64")),
+                        "q0": datasets.Sequence(datasets.Value("string")),
+                        "docid": datasets.Sequence(datasets.Value("string")),
+                        "rel": datasets.Sequence(datasets.Value("int64")),
+                    },
+                }
+            ),
+            homepage="https://github.com/joaopalotti/trectools",
+        )
+    def _compute(self, references, predictions):
+        """Returns the TREC evaluation scores."""
+        if len(predictions) > 1 or len(references) > 1:
+            raise ValueError(
+                f"You can only pass one prediction and reference per evaluation. You passed {len(predictions)} prediction(s) and {len(references)} reference(s)."
+            )
+        df_run = pd.DataFrame(predictions[0])
+        df_qrel = pd.DataFrame(references[0])
+        trec_run = TrecRun()
+        trec_run.filename = "placeholder.file"
+        trec_run.run_data = df_run
+        trec_qrel = TrecQrel()
+        trec_qrel.filename = "placeholder.file"
+        trec_qrel.qrels_data = df_qrel
+        trec_eval = TrecEval(trec_run, trec_qrel)
+        result = {}
+        result["runid"] = trec_eval.run.get_runid()
+        result["num_ret"] = trec_eval.get_retrieved_documents(per_query=False)
+        result["num_rel"] = trec_eval.get_relevant_documents(per_query=False)
+        result["num_rel_ret"] = trec_eval.get_relevant_retrieved_documents(per_query=False)
+        result["num_q"] = len(trec_eval.run.topics())
+        result["map"] = trec_eval.get_map(depth=10000, per_query=False, trec_eval=True)
+        result["gm_map"] = trec_eval.get_geometric_map(depth=10000, trec_eval=True)
+        result["bpref"] = trec_eval.get_bpref(depth=1000, per_query=False, trec_eval=True)
+        result["Rprec"] = trec_eval.get_rprec(depth=1000, per_query=False, trec_eval=True)
+        result["recip_rank"] = trec_eval.get_reciprocal_rank(depth=1000, per_query=False, trec_eval=True)
+        for v in [5, 10, 15, 20, 30, 100, 200, 500, 1000]:
+            result[f"P@{v}"] = trec_eval.get_precision(depth=v, per_query=False, trec_eval=True)
+        for v in [5, 10, 15, 20, 30, 100, 200, 500, 1000]:
+            result[f"NDCG@{v}"] = trec_eval.get_ndcg(depth=v, per_query=False, trec_eval=True)
+        return result