knowledgator
/

comprehend_it-base

 ---
 license: apache-2.0
+datasets:
+- multi_nli
+- xnli
+- dbpedia_14
+- SetFit/bbc-news
+- squad_v2
+- race
+language:
+- en
+metrics:
+- accuracy
+- f1
+library_name: transformers
+pipeline_tag: zero-shot-classification
+tags:
+- classification
+- information-extraction
+- zero-shot
 ---
+**comprehend_it-base**
+This is a model based on [DeBERTaV3-base](https://huggingface.co/microsoft/deberta-v3-base) that was trained on natural language inference datasets as well as on multiple text classification datasets.
+It demonstrates better quality on the diverse set of text classification datasets in a zero-shot setting than [Bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) while being almost 3 times smaller.
+Moreover, the model can be used for multiple information extraction tasks in zero-shot setting, including:
+* Named-entity recognition;
+* Relation extraction;
+* Entity linking;
+* Question-answering;
+#### With the zero-shot classification pipeline
+The model can be loaded with the `zero-shot-classification` pipeline like so:
+```python
+from transformers import pipeline
+classifier = pipeline("zero-shot-classification",
+                      model="facebook/bart-large-mnli")
+```
+You can then use this pipeline to classify sequences into any of the class names you specify.
+```python
+sequence_to_classify = "one day I will see the world"
+candidate_labels = ['travel', 'cooking', 'dancing']
+classifier(sequence_to_classify, candidate_labels)
+#{'labels': ['travel', 'dancing', 'cooking'],
+# 'scores': [0.9938651323318481, 0.0032737774308770895, 0.002861034357920289],
+# 'sequence': 'one day I will see the world'}
+```
+If more than one candidate label can be correct, pass `multi_label=True` to calculate each class independently:
+```python
+candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
+classifier(sequence_to_classify, candidate_labels, multi_label=True)
+#{'labels': ['travel', 'exploration', 'dancing', 'cooking'],
+# 'scores': [0.9945111274719238,
+#  0.9383890628814697,
+#  0.0057061901316046715,
+#  0.0018193122232332826],
+# 'sequence': 'one day I will see the world'}
+```
+#### With manual PyTorch
+```python
+# pose sequence as a NLI premise and label as a hypothesis
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+nli_model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
+tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')
+premise = sequence
+hypothesis = f'This example is {label}.'
+# run through model pre-trained on MNLI
+x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
+                     truncation_strategy='only_first')
+logits = nli_model(x.to(device))[0]
+# we throw away "neutral" (dim 1) and take the probability of
+# "entailment" (2) as the probability of the label being true
+entail_contradiction_logits = logits[:,[0,2]]
+probs = entail_contradiction_logits.softmax(dim=1)
+prob_label_is_true = probs[:,1]
+```
+### Benchmarking
+| Model                       | IMDB | AG_NEWS | Emotions |
+|-----------------------------|------|---------|----------|
+| [Bart-large-mnli (407 M)](https://huggingface.co/facebook/bart-large-mnli)      | 0.89 | 0.6887  | 0.3765   |
+| [Deberta-base-v3 (184 M)](https://huggingface.co/cross-encoder/nli-deberta-v3-base)      | 0.85 | 0.6455  | 0.5095   |
+| Comprehendo (184M)           | 0.90 | 0.7982  | 0.5660   |
+### Future reading
+Check our blogpost - ["The new milestone in zero-shot capabilities (it’s not Generative AI)."](https://medium.com/p/9b5a081fbf27), where we highlighted possible use-cases of the model and why next-token prediction is not the only way to achive amazing zero-shot capabilites.
+While most of the AI industry is focused on generative AI and decoder-based models, we are committed to developing encoder-based models.
+We aim to achieve the same level of generalization for such models as their decoder brothers. Encoders have several wonderful properties, such as bidirectional attention, and they are the best choice for many information extraction tasks in terms of efficiency and controllability.