Spaces:
Sleeping
Sleeping
antonioalegria
commited on
Commit
•
4912e21
1
Parent(s):
abb1d54
Added the Balanced Accuracy logic.
Browse files- Based on scikit-learn's balanced_accuracy_score.
- Based on the accuracy template.
- README.md +83 -22
- balanced_accuracy.py +94 -55
- requirements.txt +2 -1
README.md
CHANGED
@@ -1,50 +1,111 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
- evaluate
|
7 |
-
- metric
|
8 |
-
description: "TODO: add a description here"
|
9 |
sdk: gradio
|
10 |
sdk_version: 3.19.1
|
11 |
app_file: app.py
|
12 |
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
---
|
14 |
|
15 |
# Metric Card for Balanced Accuracy
|
16 |
|
17 |
-
***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
|
18 |
-
|
19 |
## Metric Description
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
## How to Use
|
23 |
-
*Give general statement of how to use the metric*
|
24 |
|
25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
### Inputs
|
28 |
-
|
29 |
-
|
|
|
|
|
|
|
30 |
|
31 |
### Output Values
|
32 |
|
33 |
-
|
34 |
|
35 |
-
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
#### Values from Popular Papers
|
38 |
-
|
|
|
39 |
|
40 |
### Examples
|
41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
## Limitations and Bias
|
44 |
-
*Note any known limitations or biases that the metric has, with links and references if possible.*
|
45 |
|
46 |
-
|
47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
## Further References
|
50 |
-
*Add any useful further references.*
|
|
|
1 |
---
|
2 |
+
title: Accuracy
|
3 |
+
emoji: 🤗
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: red
|
|
|
|
|
|
|
6 |
sdk: gradio
|
7 |
sdk_version: 3.19.1
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
+
tags:
|
11 |
+
- evaluate
|
12 |
+
- metric
|
13 |
+
description: >-
|
14 |
+
Balanced Accuracy is the average of recall obtained on each class. It can be computed with:
|
15 |
+
Balanced Accuracy = (TPR + TNR) / N
|
16 |
+
Where:
|
17 |
+
TPR: True positive rate
|
18 |
+
TNR: True negative rate
|
19 |
+
N: Number of classes
|
20 |
---
|
21 |
|
22 |
# Metric Card for Balanced Accuracy
|
23 |
|
|
|
|
|
24 |
## Metric Description
|
25 |
+
|
26 |
+
Balanced Accuracy is the average of recall obtained on each class. It can be computed with:
|
27 |
+
Balanced Accuracy = (TPR + TNR) / N
|
28 |
+
Where:
|
29 |
+
TPR: True positive rate
|
30 |
+
TNR: True negative rate
|
31 |
+
N: Number of classes
|
32 |
|
33 |
## How to Use
|
|
|
34 |
|
35 |
+
At minimum, this metric requires predictions and references as inputs.
|
36 |
+
|
37 |
+
```python
|
38 |
+
>>> accuracy_metric = evaluate.load("hyperml/balanced_accuracy")
|
39 |
+
>>> results = accuracy_metric.compute(references=[0, 1], predictions=[0, 1])
|
40 |
+
>>> print(results)
|
41 |
+
{'balanced_accuracy': 1.0}
|
42 |
+
```
|
43 |
|
44 |
### Inputs
|
45 |
+
|
46 |
+
**predictions** (list of int): Predicted labels.
|
47 |
+
**references** (list of int): Ground truth labels.
|
48 |
+
**sample_weight** (list of float): Sample weights Defaults to None.
|
49 |
+
**adjusted** (boolean): If set to True, adjusts the score by accounting for chance. Useful in handling imbalanced datasets. Defaults to False.
|
50 |
|
51 |
### Output Values
|
52 |
|
53 |
+
- **balanced_accuracy** (float): Balanced Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0. A higher score means higher balanced accuracy.
|
54 |
|
55 |
+
Output Example(s):
|
56 |
+
```python
|
57 |
+
{'balanced_accuracy': 1.0}
|
58 |
+
```
|
59 |
+
|
60 |
+
This metric outputs a dictionary, containing the balanced accuracy score.
|
61 |
|
62 |
#### Values from Popular Papers
|
63 |
+
|
64 |
+
Balanced accuracy is often used to report performance on supervised classification tasks such as sentiment analysis or fraud detection, where there is a severe imbalance in the classes.
|
65 |
|
66 |
### Examples
|
67 |
+
|
68 |
+
Example 1-A simple example
|
69 |
+
```python
|
70 |
+
>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
|
71 |
+
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
|
72 |
+
>>> print(results)
|
73 |
+
{'balanced_accuracy': 0.5}
|
74 |
+
```
|
75 |
+
|
76 |
+
Example 2-The same as Example 1, except with `sample_weight` set.
|
77 |
+
```python
|
78 |
+
>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
|
79 |
+
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
|
80 |
+
>>> print(results)
|
81 |
+
{'balanced_accuracy': 0.8778625954198473} # TODO: check if this is correct
|
82 |
+
```
|
83 |
+
|
84 |
+
Example 3-The same as Example 1, except with `adjusted` set to `True`.
|
85 |
+
```python
|
86 |
+
>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
|
87 |
+
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], adjusted=True)
|
88 |
+
>>> print(results)
|
89 |
+
{'balanced_accuracy': 0.8} # TODO: check if this is correct
|
90 |
+
```
|
91 |
|
92 |
## Limitations and Bias
|
|
|
93 |
|
94 |
+
The balanced accuracy metric has limitations when it comes to extreme cases such as perfectly balanced or highly imbalanced datasets. For example, in perfectly balanced datasets, it behaves the same as standard accuracy. However, in highly imbalanced datasets where a class has very few samples, a small change in the prediction for that class can cause a large change in the balanced accuracy score.
|
95 |
+
|
96 |
+
## Citation(s)
|
97 |
+
```bibtex
|
98 |
+
@article{scikit-learn,
|
99 |
+
title={Scikit-learn: Machine Learning in {P}ython},
|
100 |
+
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
|
101 |
+
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
|
102 |
+
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
|
103 |
+
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
|
104 |
+
journal={Journal of Machine Learning Research},
|
105 |
+
volume={12},
|
106 |
+
pages={2825--2830},
|
107 |
+
year={2011}
|
108 |
+
}
|
109 |
+
```
|
110 |
|
111 |
## Further References
|
|
balanced_accuracy.py
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
# Copyright
|
2 |
#
|
3 |
# Licensed under the Apache License, Version 2.0 (the "License");
|
4 |
# you may not use this file except in compliance with the License.
|
@@ -11,85 +11,124 @@
|
|
11 |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12 |
# See the License for the specific language governing permissions and
|
13 |
# limitations under the License.
|
14 |
-
"""
|
15 |
|
16 |
import evaluate
|
17 |
import datasets
|
|
|
|
|
18 |
|
19 |
|
20 |
-
# TODO: Add BibTeX citation
|
21 |
-
_CITATION = """\
|
22 |
-
@InProceedings{huggingface:module,
|
23 |
-
title = {A great new module},
|
24 |
-
authors={huggingface, Inc.},
|
25 |
-
year={2020}
|
26 |
-
}
|
27 |
-
"""
|
28 |
|
29 |
-
|
30 |
-
|
31 |
-
|
|
|
|
|
|
|
|
|
32 |
"""
|
33 |
|
34 |
|
35 |
-
# TODO: Add description of the arguments of the module here
|
36 |
_KWARGS_DESCRIPTION = """
|
37 |
-
Calculates how good are predictions given some references, using certain scores
|
38 |
Args:
|
39 |
-
predictions
|
40 |
-
|
41 |
-
|
42 |
-
|
|
|
43 |
Returns:
|
44 |
-
accuracy:
|
45 |
-
|
46 |
Examples:
|
47 |
-
Examples should be written in doctest format, and should illustrate how
|
48 |
-
to use the function.
|
49 |
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
"""
|
55 |
|
56 |
-
|
57 |
-
|
|
|
|
|
|
|
|
|
58 |
|
|
|
|
|
59 |
|
60 |
-
|
61 |
-
|
62 |
-
|
|
|
|
|
|
|
|
|
63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
def _info(self):
|
65 |
-
# TODO: Specifies the evaluate.EvaluationModuleInfo object
|
66 |
return evaluate.MetricInfo(
|
67 |
-
# This is the description that will appear on the modules page.
|
68 |
-
module_type="metric",
|
69 |
description=_DESCRIPTION,
|
70 |
citation=_CITATION,
|
71 |
inputs_description=_KWARGS_DESCRIPTION,
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
|
|
|
|
82 |
)
|
83 |
|
84 |
-
def
|
85 |
-
"""Optional: download external resources useful to compute the scores"""
|
86 |
-
# TODO: Download external resources if needed
|
87 |
-
pass
|
88 |
-
|
89 |
-
def _compute(self, predictions, references):
|
90 |
-
"""Returns the scores"""
|
91 |
-
# TODO: Compute the different scores of the module
|
92 |
-
accuracy = sum(i == j for i, j in zip(predictions, references)) / len(predictions)
|
93 |
return {
|
94 |
-
"
|
|
|
|
|
95 |
}
|
|
|
1 |
+
# Copyright 2023 HyperML Authors and the current HyperML contributor.
|
2 |
#
|
3 |
# Licensed under the Apache License, Version 2.0 (the "License");
|
4 |
# you may not use this file except in compliance with the License.
|
|
|
11 |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12 |
# See the License for the specific language governing permissions and
|
13 |
# limitations under the License.
|
14 |
+
"""Balanced Accuracy metric."""
|
15 |
|
16 |
import evaluate
|
17 |
import datasets
|
18 |
+
from sklearn.base import accuracy_score
|
19 |
+
from sklearn.metrics import balanced_accuracy_score
|
20 |
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
+
_DESCRIPTION = """
|
24 |
+
Balanced Accuracy is the average of recall obtained on each class. It can be computed with:
|
25 |
+
Balanced Accuracy = (TPR + TNR) / N
|
26 |
+
Where:
|
27 |
+
TPR: True positive rate
|
28 |
+
TNR: True negative rate
|
29 |
+
N: Number of classes
|
30 |
"""
|
31 |
|
32 |
|
|
|
33 |
_KWARGS_DESCRIPTION = """
|
|
|
34 |
Args:
|
35 |
+
predictions (`list` of `int`): Predicted labels.
|
36 |
+
references (`list` of `int`): Ground truth labels.
|
37 |
+
normalize (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
|
38 |
+
sample_weight (`list` of `float`): Sample weights Defaults to None.
|
39 |
+
|
40 |
Returns:
|
41 |
+
accuracy (`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`.. A higher score means higher accuracy.
|
42 |
+
|
43 |
Examples:
|
|
|
|
|
44 |
|
45 |
+
Example 1-A simple example
|
46 |
+
>>> accuracy_metric = evaluate.load("accuracy")
|
47 |
+
>>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
|
48 |
+
>>> print(results)
|
49 |
+
{'accuracy': 0.5}
|
50 |
+
|
51 |
+
Example 2-The same as Example 1, except with `normalize` set to `False`.
|
52 |
+
>>> accuracy_metric = evaluate.load("accuracy")
|
53 |
+
>>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], normalize=False)
|
54 |
+
>>> print(results)
|
55 |
+
{'accuracy': 3.0}
|
56 |
+
|
57 |
+
Example 3-The same as Example 1, except with `sample_weight` set.
|
58 |
+
>>> accuracy_metric = evaluate.load("accuracy")
|
59 |
+
>>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
|
60 |
+
>>> print(results)
|
61 |
+
{'accuracy': 0.8778625954198473}
|
62 |
"""
|
63 |
|
64 |
+
_KWARGS_DESCRIPTION = """
|
65 |
+
Args:
|
66 |
+
predictions (`list` of `int`): Predicted labels.
|
67 |
+
references (`list` of `int`): Ground truth labels.
|
68 |
+
sample_weight (`list` of `float`): Sample weights Defaults to None.
|
69 |
+
adjusted (`boolean`): When true, the result is adjusted for chance, so that random performance would score 0, while keeping perfect performance at a score of 1. Defaults to False.
|
70 |
|
71 |
+
Returns:
|
72 |
+
balanced_accuracy (`float`): Balanced Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0. A higher score means higher balanced accuracy.
|
73 |
|
74 |
+
Examples:
|
75 |
+
|
76 |
+
Example 1-A simple example
|
77 |
+
>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
|
78 |
+
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
|
79 |
+
>>> print(results)
|
80 |
+
{'balanced_accuracy': 0.5}
|
81 |
|
82 |
+
Example 2-The same as Example 1, except with `sample_weight` set.
|
83 |
+
>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
|
84 |
+
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
|
85 |
+
>>> print(results)
|
86 |
+
{'balanced_accuracy': 0.8778625954198473} # TODO: check if this is correct
|
87 |
+
|
88 |
+
Example 3-The same as Example 1, except with `adjusted` set to `True`.
|
89 |
+
>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
|
90 |
+
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], adjusted=True)
|
91 |
+
>>> print(results)
|
92 |
+
{'balanced_accuracy': 0.8} # TODO: check if this is correct
|
93 |
+
"""
|
94 |
+
|
95 |
+
_CITATION = """
|
96 |
+
@article{scikit-learn,
|
97 |
+
title={Scikit-learn: Machine Learning in {P}ython},
|
98 |
+
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
|
99 |
+
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
|
100 |
+
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
|
101 |
+
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
|
102 |
+
journal={Journal of Machine Learning Research},
|
103 |
+
volume={12},
|
104 |
+
pages={2825--2830},
|
105 |
+
year={2011}
|
106 |
+
}
|
107 |
+
"""
|
108 |
+
|
109 |
+
class BalancedAccuracy(evaluate.Metric):
|
110 |
def _info(self):
|
|
|
111 |
return evaluate.MetricInfo(
|
|
|
|
|
112 |
description=_DESCRIPTION,
|
113 |
citation=_CITATION,
|
114 |
inputs_description=_KWARGS_DESCRIPTION,
|
115 |
+
features=datasets.Features(
|
116 |
+
{
|
117 |
+
"predictions": datasets.Sequence(datasets.Value("int32")),
|
118 |
+
"references": datasets.Sequence(datasets.Value("int32")),
|
119 |
+
}
|
120 |
+
if self.config_name == "multilabel"
|
121 |
+
else {
|
122 |
+
"predictions": datasets.Value("int32"),
|
123 |
+
"references": datasets.Value("int32"),
|
124 |
+
}
|
125 |
+
),
|
126 |
+
reference_urls=["https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html"],
|
127 |
)
|
128 |
|
129 |
+
def _compute(self, predictions, references, sample_weight=None, adjusted=False):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
130 |
return {
|
131 |
+
"balanced_accuracy": float(
|
132 |
+
balanced_accuracy_score(references, predictions, sample_weight=sample_weight, adjusted=adjusted)
|
133 |
+
)
|
134 |
}
|
requirements.txt
CHANGED
@@ -1 +1,2 @@
|
|
1 |
-
git+https://github.com/huggingface/evaluate@main
|
|
|
|
1 |
+
git+https://github.com/huggingface/evaluate@main
|
2 |
+
scikit-learn
|