Angelina Wang commited on
Commit
66deede
·
1 Parent(s): 41f9efc

inital metric files

Browse files
Files changed (4) hide show
  1. README.md +53 -5
  2. app.py +6 -0
  3. directional_bias_amplification.py +103 -0
  4. requirements.txt +0 -0
README.md CHANGED
@@ -1,12 +1,60 @@
1
  ---
2
- title: Directional_bias_amplification
3
- emoji: 🌖
4
- colorFrom: gray
5
- colorTo: pink
6
  sdk: gradio
7
  sdk_version: 3.0.12
8
  app_file: app.py
9
  pinned: false
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Directional Bias Amplification
3
+ emoji: 🌴
4
+ colorFrom: purple
5
+ colorTo: blue
6
  sdk: gradio
7
  sdk_version: 3.0.12
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
  ---
14
 
15
+ # Metric Card for Directional Bias Amplification
16
+
17
+ ## Metric Description
18
+ Directional Bias Amplification is a metric that captures the amount of bias (i.e., a conditional probability) that is amplified. This metric was introduced in the ICML 2021 paper ["Directional Bias Amplification"](https://arxiv.org/abs/2102.12594)
19
+
20
+ ## How to Use
21
+ This metric operates on multi-label (including binary) classification settings where each image has a(n) associated sensitive attribute(s).
22
+ This metric requires three sets of inputs:
23
+ - Predictions representing the model output on the task (predictions)
24
+ - Ground-truth labels on the task (references)
25
+ - Ground-truth labels on the sensitive attribute of interest (attributes)
26
+
27
+ ### Inputs
28
+ - **predictions** (`array` of `int`): Predicted task labels. Array of size n x |T|. n is number of samples, |T| is number of task labels. All values are binary 0 or 1.
29
+ - **references** (`array` of `int`): Ground truth task labels. Array of size n x |T|. n is number of samples, |T| is number of task labels. All values are binary 0 or 1.
30
+ - **attributes** (`array` of `int`): Ground truth attribute labels. Array of size n x |A|. n is number of samples, |A| is number of attribute labels. All values are binary 0 or 1.
31
+
32
+ ### Output Values
33
+ - **bias_amplification** (`float`): Bias amplification value. Minimum possible value is 0, and maximum possible value is 1.0. The higher the value, the more "bias" is amplified.
34
+ - **disagg_bias_amplification** (`array` of `float`): Array of size (number of unique attribute label values) x (number of unique task label values). Each array value represents the bias amplification of that particular task given that particular attribute.
35
+
36
+ ### Examples
37
+
38
+ Imagine a scenario with 3 individuals in Group A and 5 individuals in Group B. Task label `1` is biased because 2 of the 3 individuals in Group A have it, whereas only 1 of the 5 individuals in Group B do. The model amplifies this bias, and predicts all members of Group A to have task label `1`, and no members of Group B to have task label `1`.
39
+
40
+ ```python
41
+ >>> bias_amp_metric = evaluate.load("directional_bias_amplification")
42
+ >>> results = bias_amp_metric.compute(references=[[0], [1], [1], [0], [0], [0], [0], [1]], predictions=[[1], [1], [1], [0], [0], [0], [0], [0]], attributes=[[0], [0], [0], [1], [1], [1], [1], [1]])
43
+ >>> print(results)
44
+ {'bias_amplification': (0.2667, 'disagg_bias_amplification': [[0.3333], [0.2]]}
45
+ ```
46
+
47
+ ## Limitations and Bias
48
+ An strong assumption made by this metric is that ground truth labels exist, are known, and are agreed upon. Further, a perfectly accurate model that achieves zero bias amplification is one that continues to perpetuate the biases in the data.
49
+
50
+ Please refer to Sec. 5.3 "Limitations of Bias Amplification" of ["Directional Bias Amplification"](https://arxiv.org/abs/2102.12594) for a more detailed discussion.
51
+
52
+ ## Citation(s)
53
+ @inproceedings{wang2021biasamp,
54
+ author = {Angelina Wang and Olga Russakovsky},
55
+ title = {Directional Bias Amplification},
56
+ booktitle = {International Conference on Machine Learning (ICML)},
57
+ year = {2021}
58
+ }
59
+
60
+ ## Further References
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("directional_bias_amplification")
6
+ launch_gradio_widget(module)
directional_bias_amplification.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """Directional Bias Amplification metric."""
15
+
16
+ import evaluate
17
+
18
+ _DESCRIPTION = """
19
+ Directional Bias Amplification is a metric that captures the amount of bias (i.e., a conditional probability) that is amplified.
20
+ This metric was introduced in the ICML 2021 paper "Directional Bias Amplification" (https://arxiv.org/abs/2102.12594).
21
+ """
22
+
23
+ _KWARGS_DESCRIPTION = """
24
+ Args:
25
+ predictions (`array` of `int`): Predicted task labels. Array of size n x |T|. n is number of samples, |T| is number of task labels. All values are binary 0 or 1.
26
+ references (`array` of `int`): Ground truth task labels. Array of size n x |T|. n is number of samples, |T| is number of task labels. All values are binary 0 or 1.
27
+ attributes(`array` of `int`): Ground truth attribute labels. Array of size n x |A|. n is number of samples, |A| is number of attribute labels. All values are binary 0 or 1.
28
+
29
+ Returns
30
+ bias_amplification(`float`): Bias amplification value. Minimum possible value is 0, and maximum possible value is 1.0. The higher the value, the more "bias" is amplified.
31
+ disagg_bias_amplification (`array` of `float`): Array of size (number of unique attribute label values) x (number of unique task label values). Each array value represents the bias amplification of that particular task given that particular attribute.
32
+ """
33
+
34
+
35
+ _CITATION = """
36
+ @inproceedings{wang2021biasamp,
37
+ author = {Angelina Wang and Olga Russakovsky},
38
+ title = {Directional Bias Amplification},
39
+ booktitle = {International Conference on Machine Learning (ICML)},
40
+ year = {2021}
41
+ }
42
+ """
43
+
44
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
45
+ class DirectionalBiasAmplification(evaluate.EvaluationModule):
46
+ def _info(self):
47
+ return evaluate.EvaluationModuleInfo(
48
+ description=_DESCRIPTION,
49
+ citation=_CITATION,
50
+ inputs_description=_KWARGS_DESCRIPTION,
51
+ features=datasets.Features(
52
+ {
53
+ "predictions": datasets.Sequence(datasets.Value("int32")),
54
+ "references": datasets.Sequence(datasets.Value("int32")),
55
+ "attributes": datasets.Sequence(datasets.Value("int32")),
56
+ }
57
+ ),
58
+ reference_urls=["https://arxiv.org/abs/2102.12594"],
59
+ )
60
+
61
+ def _compute(self, predictions, references, attributes):
62
+
63
+ task_preds, task_labels, attribute_labels = predictions, references, attributes
64
+
65
+ assert len(task_labels.shape) == 2 and len(attribute_labels.shape) == 2, 'Please read the shape of the expected inputs, which should be "num samples" by "num classification items"'
66
+ assert len(task_labels) == len(attribute_labels) == len(task_preds), 'Please make sure the number of samples in the three input arrays is the same.'
67
+
68
+ num_t, num_a = task_labels.shape[1], attribute_labels.shape[1]
69
+
70
+ # only include images that have attribute(s) and task(s) associated with it
71
+ keep_indices = np.array(list(set(np.where(np.sum(task_labels_train, axis=1)>0)[0]).union(set(np.where(np.sum(attribute_labels_train, axis=1)>0)[0]))))
72
+ task_labels_ind, attribute_labels_ind = task_labels[keep_indices], attribute_labels[keep_indices]
73
+
74
+ # y_at calculation
75
+ p_at = np.zeros((num_a, num_t))
76
+ p_a_p_t = np.zeros((num_a, num_t))
77
+ num = len(task_labels)
78
+ for a in range(num_a):
79
+ for t in range(num_t):
80
+ t_indices = np.where(task_labels_ind[:, t]==1)[0]
81
+ a_indices = np.where(attribute_labels_ind[:, a]==1)[0]
82
+ at_indices = set(t_indices)&set(a_indices)
83
+ p_a_p_t[a][t] = (len(t_indices)/num)*(len(a_indices)/num)
84
+ p_at[a][t] = len(at_indices)/num
85
+ y_at = np.sign(p_at - p_a_p_t)
86
+
87
+ # delta_at calculation
88
+ t_cond_a = np.zeros((num_a, num_t))
89
+ that_cond_a = np.zeros((num_a, num_t))
90
+ for a in range(num_a):
91
+ for t in range(num_t):
92
+ t_cond_a[a][t] = np.mean(task_labels[:, t][np.where(attribute_labels[:, a]==1)[0]])
93
+ that_cond_a[a][t] = np.mean(task_preds[:, t][np.where(attribute_labels[:, a]==1)[0]])
94
+ delta_at = that_cond_a - t_cond_a
95
+
96
+ values = y_at*delta_at
97
+ val = np.nanmean(values)
98
+
99
+ val, values
100
+ return {
101
+ "bias_amplification": val,
102
+ "disagg_bias_amplification": values
103
+ }
requirements.txt ADDED
File without changes