metadata

title: Multi Label Precision Recall Accuracy Fscore
tags:
  - evaluate
  - metric
description: >-
  Implementation of example based evaluation metrics for multi-label
  classification presented in Zhang and Zhou (2014).
sdk: gradio
sdk_version: 4.4.0
app_file: app.py
pinned: false

Metric Card for Multi Label Precision Recall Accuracy Fscore

Implementation of example based evaluation metrics for multi-label classification presented in Zhang and Zhou (2014).

How to Use

>>> multi_label_precision_recall_accuracy_fscore = evaluate.load("mdocekal/multi_label_precision_recall_accuracy_fscore")
>>> results = multi_label_precision_recall_accuracy_fscore.compute(
            predictions=[
                ["0", "1"],
                ["1", "2"],
                ["0", "1", "2"],
            ],
            references=[
                ["0", "1"],
                ["1", "2"],
                ["0", "1", "2"],
            ]
        )
>>> print(results)
{
    "precision": 1.0,
    "recall": 1.0,
    "accuracy": 1.0,
    "fscore": 1.0
}

There is also multiset configuration available, which allows to calculate the metrics for multi-label classification with repeated labels. It uses the same definition as in previous case, but it works with multiset of labels. Thus, intersection, union, and cardinality for multisets are used instead.

>>> multi_label_precision_recall_accuracy_fscore = evaluate.load("mdocekal/multi_label_precision_recall_accuracy_fscore", config_name="multiset")
>>> results = multi_label_precision_recall_accuracy_fscore.compute(
            predictions=[
                [0, 1, 1]
            ],
            references=[
                [1, 0, 1, 1, 0, 0],
            ]
        )
>>> print(results)
{
    "precision": 1.0,
    "recall": 0.5,
    "accuracy": 0.5,
    "fscore": 0.6666666666666666
}

Inputs

predictions (list[Union[int,str]]): list of predictions to score. Each predictions should be a list of predicted labels
references (list[Union[int,str]]): list of reference for each prediction. Each reference should be a list of reference labels

Output Values

This metric outputs a dictionary, containing:

precision
recall
accuracy
fscore

If prediction and reference are empty lists, the evaluation for given sample will be:

{
    "precision": 1.0,
    "recall": 1.0,
    "accuracy": 1.0,
    "fscore": 1.0
}

Citation

@article{Zhang2014ARO,
  title={A Review on Multi-Label Learning Algorithms},
  author={Min-Ling Zhang and Zhi-Hua Zhou},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2014},
  volume={26},
  pages={1819-1837},
  url={https://api.semanticscholar.org/CorpusID:1008003}
}