antonioalegria commited on
Commit
4912e21
1 Parent(s): abb1d54

Added the Balanced Accuracy logic.

Browse files

- Based on scikit-learn's balanced_accuracy_score.
- Based on the accuracy template.

Files changed (3) hide show
  1. README.md +83 -22
  2. balanced_accuracy.py +94 -55
  3. requirements.txt +2 -1
README.md CHANGED
@@ -1,50 +1,111 @@
1
  ---
2
- title: Balanced Accuracy
3
- datasets:
4
- -
5
- tags:
6
- - evaluate
7
- - metric
8
- description: "TODO: add a description here"
9
  sdk: gradio
10
  sdk_version: 3.19.1
11
  app_file: app.py
12
  pinned: false
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  # Metric Card for Balanced Accuracy
16
 
17
- ***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
18
-
19
  ## Metric Description
20
- *Give a brief overview of this metric, including what task(s) it is usually used for, if any.*
 
 
 
 
 
 
21
 
22
  ## How to Use
23
- *Give general statement of how to use the metric*
24
 
25
- *Provide simplest possible example for using the metric*
 
 
 
 
 
 
 
26
 
27
  ### Inputs
28
- *List all input arguments in the format below*
29
- - **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
 
 
 
30
 
31
  ### Output Values
32
 
33
- *Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
34
 
35
- *State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
 
 
 
 
 
36
 
37
  #### Values from Popular Papers
38
- *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
 
39
 
40
  ### Examples
41
- *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Limitations and Bias
44
- *Note any known limitations or biases that the metric has, with links and references if possible.*
45
 
46
- ## Citation
47
- *Cite the source where this metric was introduced.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  ## Further References
50
- *Add any useful further references.*
 
1
  ---
2
+ title: Accuracy
3
+ emoji: 🤗
4
+ colorFrom: blue
5
+ colorTo: red
 
 
 
6
  sdk: gradio
7
  sdk_version: 3.19.1
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
+ description: >-
14
+ Balanced Accuracy is the average of recall obtained on each class. It can be computed with:
15
+ Balanced Accuracy = (TPR + TNR) / N
16
+ Where:
17
+ TPR: True positive rate
18
+ TNR: True negative rate
19
+ N: Number of classes
20
  ---
21
 
22
  # Metric Card for Balanced Accuracy
23
 
 
 
24
  ## Metric Description
25
+
26
+ Balanced Accuracy is the average of recall obtained on each class. It can be computed with:
27
+ Balanced Accuracy = (TPR + TNR) / N
28
+ Where:
29
+ TPR: True positive rate
30
+ TNR: True negative rate
31
+ N: Number of classes
32
 
33
  ## How to Use
 
34
 
35
+ At minimum, this metric requires predictions and references as inputs.
36
+
37
+ ```python
38
+ >>> accuracy_metric = evaluate.load("hyperml/balanced_accuracy")
39
+ >>> results = accuracy_metric.compute(references=[0, 1], predictions=[0, 1])
40
+ >>> print(results)
41
+ {'balanced_accuracy': 1.0}
42
+ ```
43
 
44
  ### Inputs
45
+
46
+ **predictions** (list of int): Predicted labels.
47
+ **references** (list of int): Ground truth labels.
48
+ **sample_weight** (list of float): Sample weights Defaults to None.
49
+ **adjusted** (boolean): If set to True, adjusts the score by accounting for chance. Useful in handling imbalanced datasets. Defaults to False.
50
 
51
  ### Output Values
52
 
53
+ - **balanced_accuracy** (float): Balanced Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0. A higher score means higher balanced accuracy.
54
 
55
+ Output Example(s):
56
+ ```python
57
+ {'balanced_accuracy': 1.0}
58
+ ```
59
+
60
+ This metric outputs a dictionary, containing the balanced accuracy score.
61
 
62
  #### Values from Popular Papers
63
+
64
+ Balanced accuracy is often used to report performance on supervised classification tasks such as sentiment analysis or fraud detection, where there is a severe imbalance in the classes.
65
 
66
  ### Examples
67
+
68
+ Example 1-A simple example
69
+ ```python
70
+ >>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
71
+ >>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
72
+ >>> print(results)
73
+ {'balanced_accuracy': 0.5}
74
+ ```
75
+
76
+ Example 2-The same as Example 1, except with `sample_weight` set.
77
+ ```python
78
+ >>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
79
+ >>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
80
+ >>> print(results)
81
+ {'balanced_accuracy': 0.8778625954198473} # TODO: check if this is correct
82
+ ```
83
+
84
+ Example 3-The same as Example 1, except with `adjusted` set to `True`.
85
+ ```python
86
+ >>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
87
+ >>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], adjusted=True)
88
+ >>> print(results)
89
+ {'balanced_accuracy': 0.8} # TODO: check if this is correct
90
+ ```
91
 
92
  ## Limitations and Bias
 
93
 
94
+ The balanced accuracy metric has limitations when it comes to extreme cases such as perfectly balanced or highly imbalanced datasets. For example, in perfectly balanced datasets, it behaves the same as standard accuracy. However, in highly imbalanced datasets where a class has very few samples, a small change in the prediction for that class can cause a large change in the balanced accuracy score.
95
+
96
+ ## Citation(s)
97
+ ```bibtex
98
+ @article{scikit-learn,
99
+ title={Scikit-learn: Machine Learning in {P}ython},
100
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
101
+ and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
102
+ and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
103
+ Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
104
+ journal={Journal of Machine Learning Research},
105
+ volume={12},
106
+ pages={2825--2830},
107
+ year={2011}
108
+ }
109
+ ```
110
 
111
  ## Further References
 
balanced_accuracy.py CHANGED
@@ -1,4 +1,4 @@
1
- # Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
2
  #
3
  # Licensed under the Apache License, Version 2.0 (the "License");
4
  # you may not use this file except in compliance with the License.
@@ -11,85 +11,124 @@
11
  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
  # See the License for the specific language governing permissions and
13
  # limitations under the License.
14
- """TODO: Add a description here."""
15
 
16
  import evaluate
17
  import datasets
 
 
18
 
19
 
20
- # TODO: Add BibTeX citation
21
- _CITATION = """\
22
- @InProceedings{huggingface:module,
23
- title = {A great new module},
24
- authors={huggingface, Inc.},
25
- year={2020}
26
- }
27
- """
28
 
29
- # TODO: Add description of the module here
30
- _DESCRIPTION = """\
31
- This new module is designed to solve this great ML task and is crafted with a lot of care.
 
 
 
 
32
  """
33
 
34
 
35
- # TODO: Add description of the arguments of the module here
36
  _KWARGS_DESCRIPTION = """
37
- Calculates how good are predictions given some references, using certain scores
38
  Args:
39
- predictions: list of predictions to score. Each predictions
40
- should be a string with tokens separated by spaces.
41
- references: list of reference for each prediction. Each
42
- reference should be a string with tokens separated by spaces.
 
43
  Returns:
44
- accuracy: description of the first score,
45
- another_score: description of the second score,
46
  Examples:
47
- Examples should be written in doctest format, and should illustrate how
48
- to use the function.
49
 
50
- >>> my_new_module = evaluate.load("my_new_module")
51
- >>> results = my_new_module.compute(references=[0, 1], predictions=[0, 1])
52
- >>> print(results)
53
- {'accuracy': 1.0}
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  """
55
 
56
- # TODO: Define external resources urls if needed
57
- BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
 
 
 
 
58
 
 
 
59
 
60
- @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
61
- class BalancedAccuracy(evaluate.Metric):
62
- """TODO: Short description of my evaluation module."""
 
 
 
 
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  def _info(self):
65
- # TODO: Specifies the evaluate.EvaluationModuleInfo object
66
  return evaluate.MetricInfo(
67
- # This is the description that will appear on the modules page.
68
- module_type="metric",
69
  description=_DESCRIPTION,
70
  citation=_CITATION,
71
  inputs_description=_KWARGS_DESCRIPTION,
72
- # This defines the format of each prediction and reference
73
- features=datasets.Features({
74
- 'predictions': datasets.Value('int64'),
75
- 'references': datasets.Value('int64'),
76
- }),
77
- # Homepage of the module for documentation
78
- homepage="http://module.homepage",
79
- # Additional links to the codebase or references
80
- codebase_urls=["http://github.com/path/to/codebase/of/new_module"],
81
- reference_urls=["http://path.to.reference.url/new_module"]
 
 
82
  )
83
 
84
- def _download_and_prepare(self, dl_manager):
85
- """Optional: download external resources useful to compute the scores"""
86
- # TODO: Download external resources if needed
87
- pass
88
-
89
- def _compute(self, predictions, references):
90
- """Returns the scores"""
91
- # TODO: Compute the different scores of the module
92
- accuracy = sum(i == j for i, j in zip(predictions, references)) / len(predictions)
93
  return {
94
- "accuracy": accuracy,
 
 
95
  }
 
1
+ # Copyright 2023 HyperML Authors and the current HyperML contributor.
2
  #
3
  # Licensed under the Apache License, Version 2.0 (the "License");
4
  # you may not use this file except in compliance with the License.
 
11
  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
  # See the License for the specific language governing permissions and
13
  # limitations under the License.
14
+ """Balanced Accuracy metric."""
15
 
16
  import evaluate
17
  import datasets
18
+ from sklearn.base import accuracy_score
19
+ from sklearn.metrics import balanced_accuracy_score
20
 
21
 
 
 
 
 
 
 
 
 
22
 
23
+ _DESCRIPTION = """
24
+ Balanced Accuracy is the average of recall obtained on each class. It can be computed with:
25
+ Balanced Accuracy = (TPR + TNR) / N
26
+ Where:
27
+ TPR: True positive rate
28
+ TNR: True negative rate
29
+ N: Number of classes
30
  """
31
 
32
 
 
33
  _KWARGS_DESCRIPTION = """
 
34
  Args:
35
+ predictions (`list` of `int`): Predicted labels.
36
+ references (`list` of `int`): Ground truth labels.
37
+ normalize (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
38
+ sample_weight (`list` of `float`): Sample weights Defaults to None.
39
+
40
  Returns:
41
+ accuracy (`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`.. A higher score means higher accuracy.
42
+
43
  Examples:
 
 
44
 
45
+ Example 1-A simple example
46
+ >>> accuracy_metric = evaluate.load("accuracy")
47
+ >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
48
+ >>> print(results)
49
+ {'accuracy': 0.5}
50
+
51
+ Example 2-The same as Example 1, except with `normalize` set to `False`.
52
+ >>> accuracy_metric = evaluate.load("accuracy")
53
+ >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], normalize=False)
54
+ >>> print(results)
55
+ {'accuracy': 3.0}
56
+
57
+ Example 3-The same as Example 1, except with `sample_weight` set.
58
+ >>> accuracy_metric = evaluate.load("accuracy")
59
+ >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
60
+ >>> print(results)
61
+ {'accuracy': 0.8778625954198473}
62
  """
63
 
64
+ _KWARGS_DESCRIPTION = """
65
+ Args:
66
+ predictions (`list` of `int`): Predicted labels.
67
+ references (`list` of `int`): Ground truth labels.
68
+ sample_weight (`list` of `float`): Sample weights Defaults to None.
69
+ adjusted (`boolean`): When true, the result is adjusted for chance, so that random performance would score 0, while keeping perfect performance at a score of 1. Defaults to False.
70
 
71
+ Returns:
72
+ balanced_accuracy (`float`): Balanced Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0. A higher score means higher balanced accuracy.
73
 
74
+ Examples:
75
+
76
+ Example 1-A simple example
77
+ >>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
78
+ >>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
79
+ >>> print(results)
80
+ {'balanced_accuracy': 0.5}
81
 
82
+ Example 2-The same as Example 1, except with `sample_weight` set.
83
+ >>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
84
+ >>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
85
+ >>> print(results)
86
+ {'balanced_accuracy': 0.8778625954198473} # TODO: check if this is correct
87
+
88
+ Example 3-The same as Example 1, except with `adjusted` set to `True`.
89
+ >>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
90
+ >>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], adjusted=True)
91
+ >>> print(results)
92
+ {'balanced_accuracy': 0.8} # TODO: check if this is correct
93
+ """
94
+
95
+ _CITATION = """
96
+ @article{scikit-learn,
97
+ title={Scikit-learn: Machine Learning in {P}ython},
98
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
99
+ and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
100
+ and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
101
+ Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
102
+ journal={Journal of Machine Learning Research},
103
+ volume={12},
104
+ pages={2825--2830},
105
+ year={2011}
106
+ }
107
+ """
108
+
109
+ class BalancedAccuracy(evaluate.Metric):
110
  def _info(self):
 
111
  return evaluate.MetricInfo(
 
 
112
  description=_DESCRIPTION,
113
  citation=_CITATION,
114
  inputs_description=_KWARGS_DESCRIPTION,
115
+ features=datasets.Features(
116
+ {
117
+ "predictions": datasets.Sequence(datasets.Value("int32")),
118
+ "references": datasets.Sequence(datasets.Value("int32")),
119
+ }
120
+ if self.config_name == "multilabel"
121
+ else {
122
+ "predictions": datasets.Value("int32"),
123
+ "references": datasets.Value("int32"),
124
+ }
125
+ ),
126
+ reference_urls=["https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html"],
127
  )
128
 
129
+ def _compute(self, predictions, references, sample_weight=None, adjusted=False):
 
 
 
 
 
 
 
 
130
  return {
131
+ "balanced_accuracy": float(
132
+ balanced_accuracy_score(references, predictions, sample_weight=sample_weight, adjusted=adjusted)
133
+ )
134
  }
requirements.txt CHANGED
@@ -1 +1,2 @@
1
- git+https://github.com/huggingface/evaluate@main
 
 
1
+ git+https://github.com/huggingface/evaluate@main
2
+ scikit-learn