sileod commited on
Commit
1c105ce
1 Parent(s): 5639579

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +403 -0
README.md ADDED
@@ -0,0 +1,403 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: en
4
+ tags:
5
+ - deberta-v3-base
6
+ - deberta-v3
7
+ - deberta
8
+ - text-classification
9
+ - nli
10
+ - natural-language-inference
11
+ - multitask
12
+ - multi-task
13
+ - pipeline
14
+ - extreme-multi-task
15
+ - extreme-mtl
16
+ - tasksource
17
+ - zero-shot
18
+ - rlhf
19
+ model-index:
20
+ - name: deberta-v3-base-tasksource-nli
21
+ results:
22
+ - task:
23
+ type: text-classification
24
+ name: Text Classification
25
+ dataset:
26
+ name: glue
27
+ type: glue
28
+ config: rte
29
+ split: validation
30
+ metrics:
31
+ - type: accuracy
32
+ value: 0.89
33
+ - task:
34
+ type: natural-language-inference
35
+ name: Natural Language Inference
36
+ dataset:
37
+ name: anli-r3
38
+ type: anli
39
+ config: plain_text
40
+ split: validation
41
+ metrics:
42
+ - type: accuracy
43
+ value: 0.52
44
+ name: Accuracy
45
+ datasets:
46
+ - glue
47
+ - super_glue
48
+ - anli
49
+ - tasksource/babi_nli
50
+ - sick
51
+ - snli
52
+ - scitail
53
+ - OpenAssistant/oasst1
54
+ - universal_dependencies
55
+ - hans
56
+ - qbao775/PARARULE-Plus
57
+ - alisawuffles/WANLI
58
+ - metaeval/recast
59
+ - sileod/probability_words_nli
60
+ - joey234/nan-nli
61
+ - pietrolesci/nli_fever
62
+ - pietrolesci/breaking_nli
63
+ - pietrolesci/conj_nli
64
+ - pietrolesci/fracas
65
+ - pietrolesci/dialogue_nli
66
+ - pietrolesci/mpe
67
+ - pietrolesci/dnc
68
+ - pietrolesci/gpt3_nli
69
+ - pietrolesci/recast_white
70
+ - pietrolesci/joci
71
+ - martn-nguyen/contrast_nli
72
+ - pietrolesci/robust_nli
73
+ - pietrolesci/robust_nli_is_sd
74
+ - pietrolesci/robust_nli_li_ts
75
+ - pietrolesci/gen_debiased_nli
76
+ - pietrolesci/add_one_rte
77
+ - metaeval/imppres
78
+ - pietrolesci/glue_diagnostics
79
+ - hlgd
80
+ - PolyAI/banking77
81
+ - paws
82
+ - quora
83
+ - medical_questions_pairs
84
+ - conll2003
85
+ - nlpaueb/finer-139
86
+ - Anthropic/hh-rlhf
87
+ - Anthropic/model-written-evals
88
+ - truthful_qa
89
+ - nightingal3/fig-qa
90
+ - tasksource/bigbench
91
+ - blimp
92
+ - cos_e
93
+ - cosmos_qa
94
+ - dream
95
+ - openbookqa
96
+ - qasc
97
+ - quartz
98
+ - quail
99
+ - head_qa
100
+ - sciq
101
+ - social_i_qa
102
+ - wiki_hop
103
+ - wiqa
104
+ - piqa
105
+ - hellaswag
106
+ - pkavumba/balanced-copa
107
+ - 12ml/e-CARE
108
+ - art
109
+ - tasksource/mmlu
110
+ - winogrande
111
+ - codah
112
+ - ai2_arc
113
+ - definite_pronoun_resolution
114
+ - swag
115
+ - math_qa
116
+ - metaeval/utilitarianism
117
+ - mteb/amazon_counterfactual
118
+ - SetFit/insincere-questions
119
+ - SetFit/toxic_conversations
120
+ - turingbench/TuringBench
121
+ - trec
122
+ - tals/vitaminc
123
+ - hope_edi
124
+ - strombergnlp/rumoureval_2019
125
+ - ethos
126
+ - tweet_eval
127
+ - discovery
128
+ - pragmeval
129
+ - silicone
130
+ - lex_glue
131
+ - papluca/language-identification
132
+ - imdb
133
+ - rotten_tomatoes
134
+ - ag_news
135
+ - yelp_review_full
136
+ - financial_phrasebank
137
+ - poem_sentiment
138
+ - dbpedia_14
139
+ - amazon_polarity
140
+ - app_reviews
141
+ - hate_speech18
142
+ - sms_spam
143
+ - humicroedit
144
+ - snips_built_in_intents
145
+ - banking77
146
+ - hate_speech_offensive
147
+ - yahoo_answers_topics
148
+ - pacovaldez/stackoverflow-questions
149
+ - zapsdcn/hyperpartisan_news
150
+ - zapsdcn/sciie
151
+ - zapsdcn/citation_intent
152
+ - go_emotions
153
+ - allenai/scicite
154
+ - liar
155
+ - relbert/lexical_relation_classification
156
+ - metaeval/linguisticprobing
157
+ - tasksource/crowdflower
158
+ - metaeval/ethics
159
+ - emo
160
+ - google_wellformed_query
161
+ - tweets_hate_speech_detection
162
+ - has_part
163
+ - wnut_17
164
+ - ncbi_disease
165
+ - acronym_identification
166
+ - jnlpba
167
+ - species_800
168
+ - SpeedOfMagic/ontonotes_english
169
+ - blog_authorship_corpus
170
+ - launch/open_question_type
171
+ - health_fact
172
+ - commonsense_qa
173
+ - mc_taco
174
+ - ade_corpus_v2
175
+ - prajjwal1/discosense
176
+ - circa
177
+ - PiC/phrase_similarity
178
+ - copenlu/scientific-exaggeration-detection
179
+ - quarel
180
+ - mwong/fever-evidence-related
181
+ - numer_sense
182
+ - dynabench/dynasent
183
+ - raquiba/Sarcasm_News_Headline
184
+ - sem_eval_2010_task_8
185
+ - demo-org/auditor_review
186
+ - medmcqa
187
+ - aqua_rat
188
+ - RuyuanWan/Dynasent_Disagreement
189
+ - RuyuanWan/Politeness_Disagreement
190
+ - RuyuanWan/SBIC_Disagreement
191
+ - RuyuanWan/SChem_Disagreement
192
+ - RuyuanWan/Dilemmas_Disagreement
193
+ - lucasmccabe/logiqa
194
+ - wiki_qa
195
+ - metaeval/cycic_classification
196
+ - metaeval/cycic_multiplechoice
197
+ - metaeval/sts-companion
198
+ - metaeval/commonsense_qa_2.0
199
+ - metaeval/lingnli
200
+ - metaeval/monotonicity-entailment
201
+ - metaeval/arct
202
+ - metaeval/scinli
203
+ - metaeval/naturallogic
204
+ - onestop_qa
205
+ - demelin/moral_stories
206
+ - corypaik/prost
207
+ - aps/dynahate
208
+ - metaeval/syntactic-augmentation-nli
209
+ - metaeval/autotnli
210
+ - lasha-nlp/CONDAQA
211
+ - openai/webgpt_comparisons
212
+ - Dahoas/synthetic-instruct-gptj-pairwise
213
+ - metaeval/scruples
214
+ - metaeval/wouldyourather
215
+ - sileod/attempto-nli
216
+ - metaeval/defeasible-nli
217
+ - metaeval/help-nli
218
+ - metaeval/nli-veridicality-transitivity
219
+ - metaeval/natural-language-satisfiability
220
+ - metaeval/lonli
221
+ - tasksource/dadc-limit-nli
222
+ - ColumbiaNLP/FLUTE
223
+ - metaeval/strategy-qa
224
+ - openai/summarize_from_feedback
225
+ - tasksource/folio
226
+ - metaeval/tomi-nli
227
+ - metaeval/avicenna
228
+ - stanfordnlp/SHP
229
+ - GBaker/MedQA-USMLE-4-options-hf
230
+ - GBaker/MedQA-USMLE-4-options
231
+ - sileod/wikimedqa
232
+ - declare-lab/cicero
233
+ - amydeng2000/CREAK
234
+ - metaeval/mutual
235
+ - inverse-scaling/NeQA
236
+ - inverse-scaling/quote-repetition
237
+ - inverse-scaling/redefine-math
238
+ - tasksource/puzzte
239
+ - metaeval/implicatures
240
+ - race
241
+ - metaeval/spartqa-yn
242
+ - metaeval/spartqa-mchoice
243
+ - metaeval/temporal-nli
244
+ - metaeval/ScienceQA_text_only
245
+ - AndyChiang/cloth
246
+ - metaeval/logiqa-2.0-nli
247
+ - tasksource/oasst1_dense_flat
248
+ - metaeval/boolq-natural-perturbations
249
+ - metaeval/path-naturalness-prediction
250
+ - riddle_sense
251
+ - Jiangjie/ekar_english
252
+ - metaeval/implicit-hate-stg1
253
+ - metaeval/chaos-mnli-ambiguity
254
+ - IlyaGusev/headline_cause
255
+ - metaeval/race-c
256
+ - metaeval/equate
257
+ - metaeval/ambient
258
+ - AndyChiang/dgen
259
+ - metaeval/clcd-english
260
+ - civil_comments
261
+ - metaeval/acceptability-prediction
262
+ - maximedb/twentyquestions
263
+ - metaeval/counterfactually-augmented-snli
264
+ - tasksource/I2D2
265
+ - sileod/mindgames
266
+ - metaeval/counterfactually-augmented-imdb
267
+ - metaeval/cnli
268
+ - metaeval/reclor
269
+ - tasksource/oasst1_pairwise_rlhf_reward
270
+ - tasksource/zero-shot-label-nli
271
+ - webis/args_me
272
+ - webis/Touche23-ValueEval
273
+ - tasksource/starcon
274
+ - tasksource/ruletaker
275
+ - lighteval/lsat_qa
276
+ - tasksource/ConTRoL-nli
277
+ - tasksource/tracie
278
+ - tasksource/sherliic
279
+ - tasksource/sen-making
280
+ - tasksource/winowhy
281
+ - mediabiasgroup/mbib-base
282
+ - tasksource/robustLR
283
+ - CLUTRR/v1
284
+ - tasksource/logical-fallacy
285
+ - tasksource/parade
286
+ - tasksource/cladder
287
+ - tasksource/subjectivity
288
+ - tasksource/MOH
289
+ - tasksource/VUAC
290
+ - tasksource/TroFi
291
+ - sharc_modified
292
+ - tasksource/conceptrules_v2
293
+ - tasksource/disrpt
294
+ - conll2000
295
+ - DFKI-SLT/few-nerd
296
+ - tasksource/com2sense
297
+ - tasksource/scone
298
+ - tasksource/winodict
299
+ - tasksource/fool-me-twice
300
+ - tasksource/monli
301
+ - tasksource/corr2cause
302
+ - tasksource/apt
303
+ - zeroshot/twitter-financial-news-sentiment
304
+ - tasksource/icl-symbol-tuning-instruct
305
+ - tasksource/SpaceNLI
306
+ - sihaochen/propsegment
307
+ - HannahRoseKirk/HatemojiBuild
308
+ - tasksource/regset
309
+ - tasksource/babi_nli
310
+ - lmsys/chatbot_arena_conversations
311
+ metrics:
312
+ - accuracy
313
+ library_name: transformers
314
+ pipeline_tag: zero-shot-classification
315
+ ---
316
+
317
+ # Model Card for DeBERTa-v3-small-tasksource-nli
318
+
319
+ This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-small) fine-tuned with multi-task learning on 600+ tasks of the [tasksource collection](https://github.com/sileod/tasksource/).
320
+ This checkpoint has strong zero-shot validation performance on many tasks, and can be used for:
321
+ - Zero-shot entailment-based classification for arbitrary labels [ZS].
322
+ - Natural language inference [NLI]
323
+ - Hundreds of previous tasks with tasksource-adapters [TA].
324
+ - Further fine-tuning on a new task or tasksource task (classification, token classification or multiple-choice) [FT].
325
+
326
+ # [ZS] Zero-shot classification pipeline
327
+ ```python
328
+ from transformers import pipeline
329
+ classifier = pipeline("zero-shot-classification",model="sileod/deberta-v3-small-tasksource-nli")
330
+
331
+ text = "one day I will see the world"
332
+ candidate_labels = ['travel', 'cooking', 'dancing']
333
+ classifier(text, candidate_labels)
334
+ ```
335
+ NLI training data of this model includes [label-nli](https://huggingface.co/datasets/tasksource/zero-shot-label-nli), a NLI dataset specially constructed to improve this kind of zero-shot classification.
336
+
337
+ # [NLI] Natural language inference pipeline
338
+
339
+ ```python
340
+ from transformers import pipeline
341
+ pipe = pipeline("text-classification",model="sileod/deberta-v3-small-tasksource-nli")
342
+ pipe([dict(text='there is a cat',
343
+ text_pair='there is a black cat')]) #list of (premise,hypothesis)
344
+ # [{'label': 'neutral', 'score': 0.9952911138534546}]
345
+ ```
346
+
347
+ # [TA] Tasksource-adapters: 1 line access to hundreds of tasks
348
+
349
+ ```python
350
+ # !pip install tasknet
351
+ import tasknet as tn
352
+ pipe = tn.load_pipeline('sileod/deberta-v3-small-tasksource-nli','glue/sst2') # works for 500+ tasksource tasks
353
+ pipe(['That movie was great !', 'Awful movie.'])
354
+ # [{'label': 'positive', 'score': 0.9956}, {'label': 'negative', 'score': 0.9967}]
355
+ ```
356
+ The list of tasks is available in model config.json.
357
+ This is more efficient than ZS since it requires only one forward pass per example, but it is less flexible.
358
+
359
+
360
+ # [FT] Tasknet: 3 lines fine-tuning
361
+
362
+ ```python
363
+ # !pip install tasknet
364
+ import tasknet as tn
365
+ hparams=dict(model_name='sileod/deberta-v3-small-tasksource-nli', learning_rate=2e-5)
366
+ model, trainer = tn.Model_Trainer([tn.AutoTask("glue/rte")], hparams)
367
+ trainer.train()
368
+ ```
369
+
370
+ ## Evaluation
371
+ This model ranked 1st among all models with the microsoft/deberta-v3-base architecture according to the IBM model recycling evaluation.
372
+ https://ibm.github.io/model-recycling/
373
+
374
+ ### Software and training details
375
+
376
+ The model was trained on 600 tasks for 200k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 12 days on Nvidia A30 24GB gpu.
377
+ This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
378
+
379
+
380
+ https://github.com/sileod/tasksource/ \
381
+ https://github.com/sileod/tasknet/ \
382
+ Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
383
+
384
+ # Citation
385
+
386
+ More details on this [article:](https://arxiv.org/abs/2301.05948)
387
+ ```
388
+ @article{sileo2023tasksource,
389
+ title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation},
390
+ author={Sileo, Damien},
391
+ url= {https://arxiv.org/abs/2301.05948},
392
+ journal={arXiv preprint arXiv:2301.05948},
393
+ year={2023}
394
+ }
395
+ ```
396
+
397
+
398
+ # Model Card Contact
399
+
400
401
+
402
+
403
+ </details>