akhooli commited on
Commit
867ad1e
1 Parent(s): 7115f1a

Push model using huggingface_hub.

Browse files
Files changed (4) hide show
  1. README.md +74 -90
  2. config_setfit.json +2 -2
  3. model.safetensors +1 -1
  4. model_head.pkl +2 -2
README.md CHANGED
@@ -10,12 +10,15 @@ tags:
10
  - text-classification
11
  - generated_from_setfit_trainer
12
  widget:
13
- - text: 'دغري بدكم تفوتو بخصوصيات الناس طيب ما اموال كتار معروفة و مش معروفة منوين
14
- جابتهن بتفتح... '
15
- - text: ايها السادة العرب الوزير جبران باسيل يتكلم باسمه الشخصي
16
- - text: 'وكل مين بدو يشد على مشدو '
17
- - text: لازم جائزة نوبل للكيميا ياخدها دكتاتور البعث الفاشي
18
- - text: 'زرع شعراته ولوووووو فيهن '
 
 
 
19
  inference: true
20
  model-index:
21
  - name: SetFit with akhooli/sbert_ar_nli_500k_norm
@@ -29,39 +32,14 @@ model-index:
29
  split: test
30
  metrics:
31
  - type: accuracy
32
- value: 0.8506944444444444
33
  name: Accuracy
34
  ---
35
 
36
  # SetFit with akhooli/sbert_ar_nli_500k_norm
37
 
38
- This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification.
39
- This SetFit model uses [akhooli/sbert_ar_nli_500k_norm](https://huggingface.co/akhooli/sbert_ar_nli_500k_norm) as the Sentence Transformer embedding model.
40
- A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
41
- This model is trained with few shots using the [akhooli/ar_hs](https://huggingface.co/datasets/akhooli/ar_hs) dataset. The dataset uses LLM to generate labels.
42
- Usage:
43
- ```python
44
- pip install setfit
45
- from setfit import SetFitModel
46
- from unicodedata import normalize
47
 
48
- # Download model from Hub
49
- model = SetFitModel.from_pretrained("akhooli/setfit_ar_hs")
50
- # Run inference
51
- queries = [
52
- "سكت دهراً و نطق كفراً",
53
- "الخلاف ﻻ يفسد للود قضية.",
54
- "أنت شخص منبوذ. احترم أسيادك.",
55
- "دع المكارم ﻻ ترحل لبغيتها واقعد فإنك أنت الطاعم الكاسي",
56
- ]
57
- queries_n = [normalize('NFKC', query) for query in queries]
58
- preds = model.predict(queries_n)
59
- print(preds)
60
- # if you want to see the probabilities for each label
61
- probas = model.predict_proba(queries_n)
62
- print(probas)
63
- ```
64
- The rest of this card is auto generated.
65
  The model has been trained using an efficient few-shot learning technique that involves:
66
 
67
  1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
@@ -74,7 +52,7 @@ The model has been trained using an efficient few-shot learning technique that i
74
  - **Sentence Transformer body:** [akhooli/sbert_ar_nli_500k_norm](https://huggingface.co/akhooli/sbert_ar_nli_500k_norm)
75
  - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
76
  - **Maximum Sequence Length:** 512 tokens
77
- - **Number of Classes:** 3 classes
78
  <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
79
  <!-- - **Language:** Unknown -->
80
  <!-- - **License:** Unknown -->
@@ -85,12 +63,18 @@ The model has been trained using an efficient few-shot learning technique that i
85
  - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
86
  - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
87
 
 
 
 
 
 
 
88
  ## Evaluation
89
 
90
  ### Metrics
91
  | Label | Accuracy |
92
  |:--------|:---------|
93
- | **all** | 0.8507 |
94
 
95
  ## Uses
96
 
@@ -110,7 +94,7 @@ from setfit import SetFitModel
110
  # Download from the 🤗 Hub
111
  model = SetFitModel.from_pretrained("akhooli/setfit_ar_hs")
112
  # Run inference
113
- preds = model("وكل مين بدو يشد على مشدو ")
114
  ```
115
 
116
  <!--
@@ -140,9 +124,9 @@ preds = model("وكل مين بدو يشد على مشدو ")
140
  ## Training Details
141
 
142
  ### Training Set Metrics
143
- | Training set | Min | Median | Max |
144
- |:-------------|:----|:--------|:----|
145
- | Word count | 1 | 12.7668 | 52 |
146
 
147
  | Label | Training Sample Count |
148
  |:---------|:----------------------|
@@ -164,64 +148,64 @@ preds = model("وكل مين بدو يشد على مشدو ")
164
  - warmup_proportion: 0.1
165
  - l2_weight: 0.01
166
  - seed: 42
167
- - run_name: setfit_hate_2k
168
  - eval_max_steps: -1
169
  - load_best_model_at_end: False
170
 
171
  ### Training Results
172
  | Epoch | Step | Training Loss | Validation Loss |
173
  |:------:|:----:|:-------------:|:---------------:|
174
- | 0.0004 | 1 | 0.3158 | - |
175
- | 0.04 | 100 | 0.2783 | - |
176
- | 0.08 | 200 | 0.2427 | - |
177
- | 0.12 | 300 | 0.1803 | - |
178
- | 0.16 | 400 | 0.1334 | - |
179
- | 0.2 | 500 | 0.0846 | - |
180
- | 0.24 | 600 | 0.0638 | - |
181
- | 0.28 | 700 | 0.05 | - |
182
- | 0.32 | 800 | 0.0412 | - |
183
- | 0.36 | 900 | 0.0345 | - |
184
- | 0.4 | 1000 | 0.0291 | - |
185
- | 0.44 | 1100 | 0.0232 | - |
186
- | 0.48 | 1200 | 0.0207 | - |
187
- | 0.52 | 1300 | 0.0177 | - |
188
- | 0.56 | 1400 | 0.018 | - |
189
- | 0.6 | 1500 | 0.0141 | - |
190
- | 0.64 | 1600 | 0.017 | - |
191
- | 0.68 | 1700 | 0.0133 | - |
192
- | 0.72 | 1800 | 0.014 | - |
193
- | 0.76 | 1900 | 0.0128 | - |
194
- | 0.8 | 2000 | 0.013 | - |
195
- | 0.84 | 2100 | 0.0139 | - |
196
- | 0.88 | 2200 | 0.0132 | - |
197
- | 0.92 | 2300 | 0.0105 | - |
198
- | 0.96 | 2400 | 0.008 | - |
199
- | 1.0 | 2500 | 0.0068 | - |
200
- | 1.04 | 2600 | 0.0056 | - |
201
- | 1.08 | 2700 | 0.0072 | - |
202
- | 1.12 | 2800 | 0.0038 | - |
203
- | 1.16 | 2900 | 0.005 | - |
204
  | 1.2 | 3000 | 0.0039 | - |
205
- | 1.24 | 3100 | 0.0034 | - |
206
- | 1.28 | 3200 | 0.0035 | - |
207
- | 1.32 | 3300 | 0.0038 | - |
208
- | 1.3600 | 3400 | 0.0038 | - |
209
- | 1.4 | 3500 | 0.0025 | - |
210
- | 1.44 | 3600 | 0.0045 | - |
211
- | 1.48 | 3700 | 0.003 | - |
212
- | 1.52 | 3800 | 0.0025 | - |
213
- | 1.56 | 3900 | 0.003 | - |
214
- | 1.6 | 4000 | 0.0026 | - |
215
- | 1.6400 | 4100 | 0.0029 | - |
216
- | 1.6800 | 4200 | 0.0021 | - |
217
- | 1.72 | 4300 | 0.003 | - |
218
- | 1.76 | 4400 | 0.0025 | - |
219
- | 1.8 | 4500 | 0.0032 | - |
220
- | 1.8400 | 4600 | 0.002 | - |
221
- | 1.88 | 4700 | 0.0024 | - |
222
- | 1.92 | 4800 | 0.0022 | - |
223
- | 1.96 | 4900 | 0.0024 | - |
224
- | 2.0 | 5000 | 0.0027 | - |
225
 
226
  ### Framework Versions
227
  - Python: 3.10.14
 
10
  - text-classification
11
  - generated_from_setfit_trainer
12
  widget:
13
+ - text: عزيزي جبران باسيل بدك تعرف كتييير منيح انو مش شغلتنا نحفظ امن اسرائيل يلي
14
+ ما منعترف ولن نعترف ب وجودها ابدا
15
+ - text: 'يجب على هؤلاك المجرمون الارهابيون وكل من دس فتنة انا يتحاسبو حساب مؤلم لكن
16
+ سؤال من سيحاسبهن '
17
+ - text: شيل عينك عن لبنان انت و كل كلب متلك حكايتك و غير هيك انشالله بتنباع بالعزى
18
+ - text: لسه بصرعوا طيزنا بدكن نصير متل العراق وليبيا يا حمير تجاوزناهن بأشواط، هلق
19
+ لو نصير متل العراق وليبيا تحسن كبير جدا
20
+ - text: كول هوا خسرتو بأرضك وبين جمهورك بعد ما منعت القطريين من تشجيع جمهورهم انتو
21
+ فاشلين في كل شئ وهم متفوقين عليكم في...
22
  inference: true
23
  model-index:
24
  - name: SetFit with akhooli/sbert_ar_nli_500k_norm
 
32
  split: test
33
  metrics:
34
  - type: accuracy
35
+ value: 0.8452520515826495
36
  name: Accuracy
37
  ---
38
 
39
  # SetFit with akhooli/sbert_ar_nli_500k_norm
40
 
41
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [akhooli/sbert_ar_nli_500k_norm](https://huggingface.co/akhooli/sbert_ar_nli_500k_norm) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
 
 
 
 
 
 
 
 
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  The model has been trained using an efficient few-shot learning technique that involves:
44
 
45
  1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
 
52
  - **Sentence Transformer body:** [akhooli/sbert_ar_nli_500k_norm](https://huggingface.co/akhooli/sbert_ar_nli_500k_norm)
53
  - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
54
  - **Maximum Sequence Length:** 512 tokens
55
+ - **Number of Classes:** 2 classes
56
  <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
57
  <!-- - **Language:** Unknown -->
58
  <!-- - **License:** Unknown -->
 
63
  - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
64
  - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
65
 
66
+ ### Model Labels
67
+ | Label | Examples |
68
+ |:---------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
69
+ | negative | <ul><li>'يا ريت بيمنعوا الأرغيلة بلبنان، لأن غير هيك ما منعمل ثورة '</li><li>'أصلا جبران عندو طيارة وعندو قصر بأوروبا ومحيط الهادىء الى اسهم فيه وتم اكتشاف كوكب جديد مثل زحل وجوبيتير تم شرائه ك...'</li><li>'اكره البرازيل بس لا تقوليلي خلاص كلشي انتهى بليز'</li></ul> |
70
+ | positive | <ul><li>'السيد والرئيس وليش عم تشددددد دخلك كل حجمك أرنب عند معلمك بالقرداحة'</li><li>'العوني اذا تمدن متل الجحش اذا تكدن بعمرك شفت عوني بيفهم'</li><li>'لا بس الوطن بدو تكنيس من ل متلك '</li></ul> |
71
+
72
  ## Evaluation
73
 
74
  ### Metrics
75
  | Label | Accuracy |
76
  |:--------|:---------|
77
+ | **all** | 0.8453 |
78
 
79
  ## Uses
80
 
 
94
  # Download from the 🤗 Hub
95
  model = SetFitModel.from_pretrained("akhooli/setfit_ar_hs")
96
  # Run inference
97
+ preds = model("شيل عينك عن لبنان انت و كل كلب متلك حكايتك و غير هيك انشالله بتنباع بالعزى")
98
  ```
99
 
100
  <!--
 
124
  ## Training Details
125
 
126
  ### Training Set Metrics
127
+ | Training set | Min | Median | Max |
128
+ |:-------------|:----|:-------|:----|
129
+ | Word count | 1 | 12.809 | 52 |
130
 
131
  | Label | Training Sample Count |
132
  |:---------|:----------------------|
 
148
  - warmup_proportion: 0.1
149
  - l2_weight: 0.01
150
  - seed: 42
151
+ - run_name: setfit_hate_2kv
152
  - eval_max_steps: -1
153
  - load_best_model_at_end: False
154
 
155
  ### Training Results
156
  | Epoch | Step | Training Loss | Validation Loss |
157
  |:------:|:----:|:-------------:|:---------------:|
158
+ | 0.0004 | 1 | 0.3239 | - |
159
+ | 0.04 | 100 | 0.277 | - |
160
+ | 0.08 | 200 | 0.2406 | - |
161
+ | 0.12 | 300 | 0.1737 | - |
162
+ | 0.16 | 400 | 0.1259 | - |
163
+ | 0.2 | 500 | 0.0701 | - |
164
+ | 0.24 | 600 | 0.0473 | - |
165
+ | 0.28 | 700 | 0.0298 | - |
166
+ | 0.32 | 800 | 0.0239 | - |
167
+ | 0.36 | 900 | 0.02 | - |
168
+ | 0.4 | 1000 | 0.0151 | - |
169
+ | 0.44 | 1100 | 0.0143 | - |
170
+ | 0.48 | 1200 | 0.0126 | - |
171
+ | 0.52 | 1300 | 0.0121 | - |
172
+ | 0.56 | 1400 | 0.0078 | - |
173
+ | 0.6 | 1500 | 0.0111 | - |
174
+ | 0.64 | 1600 | 0.0099 | - |
175
+ | 0.68 | 1700 | 0.0091 | - |
176
+ | 0.72 | 1800 | 0.0064 | - |
177
+ | 0.76 | 1900 | 0.0101 | - |
178
+ | 0.8 | 2000 | 0.0073 | - |
179
+ | 0.84 | 2100 | 0.0042 | - |
180
+ | 0.88 | 2200 | 0.0038 | - |
181
+ | 0.92 | 2300 | 0.0058 | - |
182
+ | 0.96 | 2400 | 0.0041 | - |
183
+ | 1.0 | 2500 | 0.0026 | - |
184
+ | 1.04 | 2600 | 0.0037 | - |
185
+ | 1.08 | 2700 | 0.0035 | - |
186
+ | 1.12 | 2800 | 0.0045 | - |
187
+ | 1.16 | 2900 | 0.0038 | - |
188
  | 1.2 | 3000 | 0.0039 | - |
189
+ | 1.24 | 3100 | 0.0018 | - |
190
+ | 1.28 | 3200 | 0.003 | - |
191
+ | 1.32 | 3300 | 0.0028 | - |
192
+ | 1.3600 | 3400 | 0.0023 | - |
193
+ | 1.4 | 3500 | 0.0022 | - |
194
+ | 1.44 | 3600 | 0.0032 | - |
195
+ | 1.48 | 3700 | 0.0028 | - |
196
+ | 1.52 | 3800 | 0.0022 | - |
197
+ | 1.56 | 3900 | 0.0024 | - |
198
+ | 1.6 | 4000 | 0.0021 | - |
199
+ | 1.6400 | 4100 | 0.0032 | - |
200
+ | 1.6800 | 4200 | 0.0026 | - |
201
+ | 1.72 | 4300 | 0.0025 | - |
202
+ | 1.76 | 4400 | 0.003 | - |
203
+ | 1.8 | 4500 | 0.0028 | - |
204
+ | 1.8400 | 4600 | 0.003 | - |
205
+ | 1.88 | 4700 | 0.0028 | - |
206
+ | 1.92 | 4800 | 0.0033 | - |
207
+ | 1.96 | 4900 | 0.0019 | - |
208
+ | 2.0 | 5000 | 0.0023 | - |
209
 
210
  ### Framework Versions
211
  - Python: 3.10.14
config_setfit.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
- "normalize_embeddings": false,
3
  "labels": [
4
  "negative",
5
  "positive"
6
- ]
 
7
  }
 
1
  {
 
2
  "labels": [
3
  "negative",
4
  "positive"
5
+ ],
6
+ "normalize_embeddings": false
7
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0d2ebdcd4940d5fd3e47d78fc0ab371baa15d3c351cb253ce4aa9ac613e917da
3
  size 540795752
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa207876d4a89ac428c7260c57c75272051dfb17bbf88ee51b56bc87c54f9a67
3
  size 540795752
model_head.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:39692e6033811b7b9a9fd4c86cdf8015f4ce4af1b7b9f4c901c285fd8465a904
3
- size 19327
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49f3e09533da336510f66c9419d4d76468ed0ad3e8378107f08645838e801645
3
+ size 7007