Ihor commited on
Commit
dafd89b
·
1 Parent(s): fa2cdd7

Update README.md

Browse files

Add basic description of the model and ways to use it.

Files changed (1) hide show
  1. README.md +102 -0
README.md CHANGED
@@ -1,3 +1,105 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - multi_nli
5
+ - xnli
6
+ - dbpedia_14
7
+ - SetFit/bbc-news
8
+ - squad_v2
9
+ - race
10
+ language:
11
+ - en
12
+ metrics:
13
+ - accuracy
14
+ - f1
15
+ library_name: transformers
16
+ pipeline_tag: zero-shot-classification
17
+ tags:
18
+ - classification
19
+ - information-extraction
20
+ - zero-shot
21
  ---
22
+
23
+ **comprehend_it-base**
24
+
25
+ This is a model based on [DeBERTaV3-base](https://huggingface.co/microsoft/deberta-v3-base) that was trained on natural language inference datasets as well as on multiple text classification datasets.
26
+
27
+ It demonstrates better quality on the diverse set of text classification datasets in a zero-shot setting than [Bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) while being almost 3 times smaller.
28
+
29
+ Moreover, the model can be used for multiple information extraction tasks in zero-shot setting, including:
30
+ * Named-entity recognition;
31
+ * Relation extraction;
32
+ * Entity linking;
33
+ * Question-answering;
34
+
35
+ #### With the zero-shot classification pipeline
36
+
37
+ The model can be loaded with the `zero-shot-classification` pipeline like so:
38
+
39
+ ```python
40
+ from transformers import pipeline
41
+ classifier = pipeline("zero-shot-classification",
42
+ model="facebook/bart-large-mnli")
43
+ ```
44
+
45
+ You can then use this pipeline to classify sequences into any of the class names you specify.
46
+
47
+ ```python
48
+ sequence_to_classify = "one day I will see the world"
49
+ candidate_labels = ['travel', 'cooking', 'dancing']
50
+ classifier(sequence_to_classify, candidate_labels)
51
+ #{'labels': ['travel', 'dancing', 'cooking'],
52
+ # 'scores': [0.9938651323318481, 0.0032737774308770895, 0.002861034357920289],
53
+ # 'sequence': 'one day I will see the world'}
54
+ ```
55
+
56
+ If more than one candidate label can be correct, pass `multi_label=True` to calculate each class independently:
57
+
58
+ ```python
59
+ candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
60
+ classifier(sequence_to_classify, candidate_labels, multi_label=True)
61
+ #{'labels': ['travel', 'exploration', 'dancing', 'cooking'],
62
+ # 'scores': [0.9945111274719238,
63
+ # 0.9383890628814697,
64
+ # 0.0057061901316046715,
65
+ # 0.0018193122232332826],
66
+ # 'sequence': 'one day I will see the world'}
67
+ ```
68
+
69
+
70
+ #### With manual PyTorch
71
+
72
+ ```python
73
+ # pose sequence as a NLI premise and label as a hypothesis
74
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
75
+ nli_model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
76
+ tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')
77
+
78
+ premise = sequence
79
+ hypothesis = f'This example is {label}.'
80
+
81
+ # run through model pre-trained on MNLI
82
+ x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
83
+ truncation_strategy='only_first')
84
+ logits = nli_model(x.to(device))[0]
85
+
86
+ # we throw away "neutral" (dim 1) and take the probability of
87
+ # "entailment" (2) as the probability of the label being true
88
+ entail_contradiction_logits = logits[:,[0,2]]
89
+ probs = entail_contradiction_logits.softmax(dim=1)
90
+ prob_label_is_true = probs[:,1]
91
+ ```
92
+
93
+ ### Benchmarking
94
+ | Model | IMDB | AG_NEWS | Emotions |
95
+ |-----------------------------|------|---------|----------|
96
+ | [Bart-large-mnli (407 M)](https://huggingface.co/facebook/bart-large-mnli) | 0.89 | 0.6887 | 0.3765 |
97
+ | [Deberta-base-v3 (184 M)](https://huggingface.co/cross-encoder/nli-deberta-v3-base) | 0.85 | 0.6455 | 0.5095 |
98
+ | Comprehendo (184M) | 0.90 | 0.7982 | 0.5660 |
99
+
100
+ ### Future reading
101
+ Check our blogpost - ["The new milestone in zero-shot capabilities (it’s not Generative AI)."](https://medium.com/p/9b5a081fbf27), where we highlighted possible use-cases of the model and why next-token prediction is not the only way to achive amazing zero-shot capabilites.
102
+ While most of the AI industry is focused on generative AI and decoder-based models, we are committed to developing encoder-based models.
103
+ We aim to achieve the same level of generalization for such models as their decoder brothers. Encoders have several wonderful properties, such as bidirectional attention, and they are the best choice for many information extraction tasks in terms of efficiency and controllability.
104
+
105
+