Spaces:

hyperml
/

balanced_accuracy

Runtime error

App Files Files Community

antonioalegria commited on May 26, 2023

Commit

4912e21

1 Parent(s): abb1d54

Added the Balanced Accuracy logic.

Browse files

- Based on scikit-learn's balanced_accuracy_score.
- Based on the accuracy template.

Files changed (3) hide show

README.md +83 -22
balanced_accuracy.py +94 -55
requirements.txt +2 -1

README.md CHANGED Viewed

@@ -1,50 +1,111 @@
 ---
-title: Balanced Accuracy
-datasets:
--
-tags:
-- evaluate
-- metric
-description: "TODO: add a description here"
 sdk: gradio
 sdk_version: 3.19.1
 app_file: app.py
 pinned: false
 ---
 # Metric Card for Balanced Accuracy
-***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
 ## Metric Description
-*Give a brief overview of this metric, including what task(s) it is usually used for, if any.*
 ## How to Use
-*Give general statement of how to use the metric*
-*Provide simplest possible example for using the metric*
 ### Inputs
-*List all input arguments in the format below*
-- **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
 ### Output Values
-*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
-*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
 #### Values from Popular Papers
-*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
 ### Examples
-*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
 ## Limitations and Bias
-*Note any known limitations or biases that the metric has, with links and references if possible.*
-## Citation
-*Cite the source where this metric was introduced.*
 ## Further References
-*Add any useful further references.*

 ---
+title: Accuracy
+emoji: 🤗
+colorFrom: blue
+colorTo: red
 sdk: gradio
 sdk_version: 3.19.1
 app_file: app.py
 pinned: false
+tags:
+- evaluate
+- metric
+description: >-
+  Balanced Accuracy is the average of recall obtained on each class. It can be computed with:
+  Balanced Accuracy = (TPR + TNR) / N
+  Where:
+  TPR: True positive rate
+  TNR: True negative rate
+  N: Number of classes
 ---
 # Metric Card for Balanced Accuracy
 ## Metric Description
+Balanced Accuracy is the average of recall obtained on each class. It can be computed with:
+Balanced Accuracy = (TPR + TNR) / N
+ Where:
+TPR: True positive rate
+TNR: True negative rate
+N: Number of classes
 ## How to Use
+At minimum, this metric requires predictions and references as inputs.
+```python
+>>> accuracy_metric = evaluate.load("hyperml/balanced_accuracy")
+>>> results = accuracy_metric.compute(references=[0, 1], predictions=[0, 1])
+>>> print(results)
+{'balanced_accuracy': 1.0}
+```
 ### Inputs
+**predictions** (list of int): Predicted labels.
+**references** (list of int): Ground truth labels.
+**sample_weight** (list of float): Sample weights Defaults to None.
+**adjusted** (boolean): If set to True, adjusts the score by accounting for chance. Useful in handling imbalanced datasets. Defaults to False.
 ### Output Values
+- **balanced_accuracy** (float): Balanced Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0. A higher score means higher balanced accuracy.
+Output Example(s):
+```python
+{'balanced_accuracy': 1.0}
+```
+This metric outputs a dictionary, containing the balanced accuracy score.
 #### Values from Popular Papers
+Balanced accuracy is often used to report performance on supervised classification tasks such as sentiment analysis or fraud detection, where there is a severe imbalance in the classes.
 ### Examples
+Example 1-A simple example
+```python
+>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
+>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
+>>> print(results)
+{'balanced_accuracy': 0.5}
+```
+Example 2-The same as Example 1, except with `sample_weight` set.
+```python
+>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
+>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
+>>> print(results)
+{'balanced_accuracy': 0.8778625954198473} # TODO: check if this is correct
+```
+Example 3-The same as Example 1, except with `adjusted` set to `True`.
+```python
+>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
+>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], adjusted=True)
+>>> print(results)
+{'balanced_accuracy': 0.8} # TODO: check if this is correct
+```
 ## Limitations and Bias
+The balanced accuracy metric has limitations when it comes to extreme cases such as perfectly balanced or highly imbalanced datasets. For example, in perfectly balanced datasets, it behaves the same as standard accuracy. However, in highly imbalanced datasets where a class has very few samples, a small change in the prediction for that class can cause a large change in the balanced accuracy score.
+## Citation(s)
+```bibtex
+@article{scikit-learn,
+  title={Scikit-learn: Machine Learning in {P}ython},
+  author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
+         and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
+         and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
+         Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
+  journal={Journal of Machine Learning Research},
+  volume={12},
+  pages={2825--2830},
+  year={2011}
+}
+```
 ## Further References

balanced_accuracy.py CHANGED Viewed

@@ -1,4 +1,4 @@
-# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -11,85 +11,124 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-"""TODO: Add a description here."""
 import evaluate
 import datasets
-# TODO: Add BibTeX citation
-_CITATION = """\
-@InProceedings{huggingface:module,
-title = {A great new module},
-authors={huggingface, Inc.},
-year={2020}
-}
-"""
-# TODO: Add description of the module here
-_DESCRIPTION = """\
-This new module is designed to solve this great ML task and is crafted with a lot of care.
 """
-# TODO: Add description of the arguments of the module here
 _KWARGS_DESCRIPTION = """
-Calculates how good are predictions given some references, using certain scores
 Args:
-    predictions: list of predictions to score. Each predictions
-        should be a string with tokens separated by spaces.
-    references: list of reference for each prediction. Each
-        reference should be a string with tokens separated by spaces.
 Returns:
-    accuracy: description of the first score,
-    another_score: description of the second score,
 Examples:
-    Examples should be written in doctest format, and should illustrate how
-    to use the function.
-    >>> my_new_module = evaluate.load("my_new_module")
-    >>> results = my_new_module.compute(references=[0, 1], predictions=[0, 1])
-    >>> print(results)
-    {'accuracy': 1.0}
 """
-# TODO: Define external resources urls if needed
-BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
-@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
-class BalancedAccuracy(evaluate.Metric):
-    """TODO: Short description of my evaluation module."""
     def _info(self):
-        # TODO: Specifies the evaluate.EvaluationModuleInfo object
         return evaluate.MetricInfo(
-            # This is the description that will appear on the modules page.
-            module_type="metric",
             description=_DESCRIPTION,
             citation=_CITATION,
             inputs_description=_KWARGS_DESCRIPTION,
-            # This defines the format of each prediction and reference
-            features=datasets.Features({
-                'predictions': datasets.Value('int64'),
-                'references': datasets.Value('int64'),
-            }),
-            # Homepage of the module for documentation
-            homepage="http://module.homepage",
-            # Additional links to the codebase or references
-            codebase_urls=["http://github.com/path/to/codebase/of/new_module"],
-            reference_urls=["http://path.to.reference.url/new_module"]
         )
-    def _download_and_prepare(self, dl_manager):
-        """Optional: download external resources useful to compute the scores"""
-        # TODO: Download external resources if needed
-        pass
-    def _compute(self, predictions, references):
-        """Returns the scores"""
-        # TODO: Compute the different scores of the module
-        accuracy = sum(i == j for i, j in zip(predictions, references)) / len(predictions)
         return {
-            "accuracy": accuracy,
         }

+# Copyright 2023 HyperML Authors and the current HyperML contributor.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+"""Balanced Accuracy metric."""
 import evaluate
 import datasets
+from sklearn.base import accuracy_score
+from sklearn.metrics import balanced_accuracy_score
+_DESCRIPTION = """
+Balanced Accuracy is the average of recall obtained on each class. It can be computed with:
+Balanced Accuracy = (TPR + TNR) / N
+Where:
+TPR: True positive rate
+TNR: True negative rate
+N: Number of classes
 """
 _KWARGS_DESCRIPTION = """
 Args:
+    predictions (`list` of `int`): Predicted labels.
+    references (`list` of `int`): Ground truth labels.
+    normalize (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
+    sample_weight (`list` of `float`): Sample weights Defaults to None.
 Returns:
+    accuracy (`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`.. A higher score means higher accuracy.
 Examples:
+    Example 1-A simple example
+        >>> accuracy_metric = evaluate.load("accuracy")
+        >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
+        >>> print(results)
+        {'accuracy': 0.5}
+    Example 2-The same as Example 1, except with `normalize` set to `False`.
+        >>> accuracy_metric = evaluate.load("accuracy")
+        >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], normalize=False)
+        >>> print(results)
+        {'accuracy': 3.0}
+    Example 3-The same as Example 1, except with `sample_weight` set.
+        >>> accuracy_metric = evaluate.load("accuracy")
+        >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
+        >>> print(results)
+        {'accuracy': 0.8778625954198473}
 """
+_KWARGS_DESCRIPTION = """
+Args:
+    predictions (`list` of `int`): Predicted labels.
+    references (`list` of `int`): Ground truth labels.
+    sample_weight (`list` of `float`): Sample weights Defaults to None.
+    adjusted (`boolean`): When true, the result is adjusted for chance, so that random performance would score 0, while keeping perfect performance at a score of 1. Defaults to False.
+Returns:
+    balanced_accuracy (`float`): Balanced Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0. A higher score means higher balanced accuracy.
+Examples:
+    Example 1-A simple example
+        >>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
+        >>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
+        >>> print(results)
+        {'balanced_accuracy': 0.5}
+    Example 2-The same as Example 1, except with `sample_weight` set.
+        >>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
+        >>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
+        >>> print(results)
+        {'balanced_accuracy': 0.8778625954198473} # TODO: check if this is correct
+    Example 3-The same as Example 1, except with `adjusted` set to `True`.
+        >>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
+        >>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], adjusted=True)
+        >>> print(results)
+        {'balanced_accuracy': 0.8} # TODO: check if this is correct
+"""
+_CITATION = """
+@article{scikit-learn,
+  title={Scikit-learn: Machine Learning in {P}ython},
+  author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
+         and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
+         and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
+         Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
+  journal={Journal of Machine Learning Research},
+  volume={12},
+  pages={2825--2830},
+  year={2011}
+}
+"""
+class BalancedAccuracy(evaluate.Metric):
     def _info(self):
         return evaluate.MetricInfo(
             description=_DESCRIPTION,
             citation=_CITATION,
             inputs_description=_KWARGS_DESCRIPTION,
+            features=datasets.Features(
+                {
+                    "predictions": datasets.Sequence(datasets.Value("int32")),
+                    "references": datasets.Sequence(datasets.Value("int32")),
+                }
+                if self.config_name == "multilabel"
+                else {
+                    "predictions": datasets.Value("int32"),
+                    "references": datasets.Value("int32"),
+                }
+            ),
+            reference_urls=["https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html"],
         )
+    def _compute(self, predictions, references, sample_weight=None, adjusted=False):
         return {
+            "balanced_accuracy": float(
+                balanced_accuracy_score(references, predictions, sample_weight=sample_weight, adjusted=adjusted)
+            )
         }

requirements.txt CHANGED Viewed

	@@ -1 +1,2 @@
1	- git+https://github.com/huggingface/evaluate@main


1	+ git+https://github.com/huggingface/evaluate@main
2	+ scikit-learn