PriyaPatel AneriThakkar commited on
Commit
4f4c9b5
·
verified ·
1 Parent(s): 65bc9dc

Update README.md (#1)

Browse files

- Update README.md (c028e094caa866c94341bb846a4443aa1334adb8)


Co-authored-by: Thakkar Aneri Pareshkumar <[email protected]>

Files changed (1) hide show
  1. README.md +117 -34
README.md CHANGED
@@ -1,58 +1,141 @@
1
- ---
2
- tags:
3
- - generated_from_keras_callback
4
- model-index:
5
- - name: bias_identificaiton45
6
- results: []
7
- ---
 
 
 
 
 
 
8
 
9
  <!-- This model card has been generated automatically according to the information Keras had access to. You should
10
  probably proofread and complete it, then remove this comment. -->
 
 
11
 
12
- # race color - 0,
13
- # socioeconomic - 1,
14
- # gender - 2,
15
- # disability - 3,
16
- # nationality - 4,
17
- # sexualorientation - 5,
18
- # physical-appearance - 6,
19
- # religion - 7,
20
- # age - 8.
21
- # Proffesion - 9.
 
22
 
23
- # bias_identificaiton45
24
 
25
- This model was trained from scratch on an unknown dataset.
26
- It achieves the following results on the evaluation set:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
 
29
  ## Model description
30
 
31
- More information needed
 
32
 
33
  ## Intended uses & limitations
34
 
35
- More information needed
 
 
 
 
 
 
 
36
 
37
- ## Training and evaluation data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
- More information needed
40
 
41
  ## Training procedure
42
 
43
- ### Training hyperparameters
 
 
 
 
44
 
45
- The following hyperparameters were used during training:
46
- - optimizer: {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': 1e-05, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}
47
- - training_precision: float32
48
 
49
- ### Training results
 
 
 
 
50
 
 
 
 
51
 
 
 
 
52
 
53
- ### Framework versions
54
 
55
- - Transformers 4.39.3
56
- - TensorFlow 2.15.0
57
- - Datasets 2.18.0
58
- - Tokenizers 0.15.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - generated_from_keras_callback
4
+ model-index:
5
+ - name: bias_identificaiton45
6
+ results: []
7
+ datasets:
8
+ - PriyaPatel/Bias_identification
9
+ metrics:
10
+ - accuracy
11
+ base_model: cardiffnlp/twitter-roberta-base-sentiment-latest
12
+ pipeline_tag: text-classification
13
+ ---
14
 
15
  <!-- This model card has been generated automatically according to the information Keras had access to. You should
16
  probably proofread and complete it, then remove this comment. -->
17
+ <!--
18
+ The dataset includes 10 types of biases, each labeled for easy identification. The biases and their corresponding labels are as follows:
19
 
20
+ 1. **Race/Color** - `0`
21
+ 2. **Socioeconomic Status** - `1`
22
+ 3. **Gender** - `2`
23
+ 4. **Disability** - `3`
24
+ 5. **Nationality** - `4`
25
+ 6. **Sexual Orientation** - `5`
26
+ 7. **Physical Appearance** - `6`
27
+ 8. **Religion** - `7`
28
+ 9. **Age** - `8`
29
+ 10. **Profession** - `9`
30
+ -->
31
 
32
+ <!-- # bias_identificaiton45
33
 
34
+ This dataset was compiled to analyze various types of stereotypical biases present in language models. It incorporates data from multiple publicly available datasets, each contributing to the identification of specific bias types.
35
+
36
+ Link of the dataset: [PriyaPatel/Bias_identification](https://huggingface.co/datasets/PriyaPatel/Bias_identification)
37
+
38
+ The biases are labeled as follows:
39
+
40
+ 1. **Race/Color** - `0`
41
+ 2. **Socioeconomic Status** - `1`
42
+ 3. **Gender** - `2`
43
+ 4. **Disability** - `3`
44
+ 5. **Nationality** - `4`
45
+ 6. **Sexual Orientation** - `5`
46
+ 7. **Physical Appearance** - `6`
47
+ 8. **Religion** - `7`
48
+ 9. **Age** - `8`
49
+ 10. **Profession** - `9` -->
50
+
51
+
52
+
53
+ <!-- ### Framework versions
54
+
55
+ - Transformers 4.39.3
56
+ - TensorFlow 2.15.0
57
+ - Datasets 2.18.0
58
+ - Tokenizers 0.15.2 -->
59
 
60
 
61
  ## Model description
62
 
63
+ This model is a fine-tuned version of the `cardiffnlp/twitter-roberta-base-sentiment-latest` on a custom dataset for bias identification in large language models. It is trained to classify input text into one of 10 bias categories.
64
+
65
 
66
  ## Intended uses & limitations
67
 
68
+ ### Intended Uses:
69
+ - **Bias Detection:** Identifying and categorizing bias types in sentences or text fragments.
70
+ - **Research:** Analyzing and understanding biases in natural language processing models.
71
+
72
+ ### Limitations:
73
+ - **Domain Specificity:** The model's performance is optimized for detecting biases within the domains represented in the training data.
74
+ - **Not for General Sentiment Analysis:** This model is not designed for general sentiment analysis or other NLP tasks.
75
+
76
 
77
+ ## Dataset Used for Training
78
+
79
+ This dataset was compiled to analyze various types of stereotypical biases present in language models. It incorporates data from multiple publicly available datasets, each contributing to the identification of specific bias types.
80
+
81
+ Link of the dataset: [PriyaPatel/Bias_identification](https://huggingface.co/datasets/PriyaPatel/Bias_identification)
82
+
83
+ The biases are labeled as follows:
84
+
85
+ 1. **Race/Color** - `0`
86
+ 2. **Socioeconomic Status** - `1`
87
+ 3. **Gender** - `2`
88
+ 4. **Disability** - `3`
89
+ 5. **Nationality** - `4`
90
+ 6. **Sexual Orientation** - `5`
91
+ 7. **Physical Appearance** - `6`
92
+ 8. **Religion** - `7`
93
+ 9. **Age** - `8`
94
+ 10. **Profession** - `9`
95
 
 
96
 
97
  ## Training procedure
98
 
99
+ - **Base Model:** `cardiffnlp/twitter-roberta-base-sentiment-latest`
100
+ - **Optimizer:** Adam with a learning rate of 0.00001
101
+ - **Loss Function:** Sparse Categorical Crossentropy
102
+ - **Batch Size:** 20
103
+ - **Epochs:** 3
104
 
105
+ ## Training hyperparameters
 
 
106
 
107
+ - **Learning Rate:** 0.00001
108
+ - **Optimizer:** Adam
109
+ - **Loss Function:** Sparse Categorical Crossentropy
110
+ - **Batch Size:** 20
111
+ - **Epochs:** 3
112
 
113
+ <!-- It achieves the following results on the validation dataset:
114
+ val_loss = 0.0744
115
+ val_accuracy = 0.9825
116
 
117
+ And the results on the testing dataset:
118
+ loss = 0.0715
119
+ accuracy = 0.9832 -->
120
 
121
+ ## Training Results
122
 
123
+ - **Validation Loss:** 0.0744
124
+ - **Validation Accuracy:** 0.9825
125
+ - **Test Loss:** 0.0715
126
+ - **Test Accuracy:** 0.9832
127
+
128
+ ## How to Load the Model
129
+
130
+ You can load the model using the Hugging Face `transformers` library as follows:
131
+
132
+ ```python
133
+ # Load model directly
134
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
135
+
136
+ tokenizer = AutoTokenizer.from_pretrained("PriyaPatel/bias_identificaiton45")
137
+ model = AutoModelForSequenceClassification.from_pretrained("PriyaPatel/bias_identificaiton45")
138
+
139
+ # Example usage
140
+ inputs = tokenizer("Your text here", return_tensors="tf")
141
+ outputs = model(**inputs)