# Pipeline

This is the most basic object in huggingface transformers libray. It is a one-stop object for doing everything under the hood and abstracting away a lot of the complexity away from the task at hand like `tokenization`, `preprocessing`, `postprocessing` etc.

In [1]:
from transformers import pipeline
classifier = pipeline(task = "sentiment-analysis")

 from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [2]:
sentences = [
 "I have been sleeping a lot lately. Wish I could do more and procrastinate less",
 "It is a wonderful day today",
 "What the heck, this software sucks!!"
]

In [3]:
classifier(sentences)

[{'label': 'NEGATIVE', 'score': 0.9991617202758789},
 {'label': 'POSITIVE', 'score': 0.999890923500061},
 {'label': 'NEGATIVE', 'score': 0.9995805621147156}]

## Zero Shot Classification

In [4]:
sentences = [
 "Rahul Dravid was a great coach and led India to win the world cup in 2024",
 "What is a transformer? It is a black box neural network model which can be used to do stuff with sequences",
 "How can one understand the meaning of life? It is not so simple",
 "Shaun had a great insight right in the middle of a surgery"
]

labels = ["Sports", "Education", "Other"]

In [5]:
classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [7]:
classifier(sequences = sentences, candidate_labels = labels)

[{'sequence': 'Rahul Dravid was a great coach and led India to win the world cup in 2024',
 'labels': ['Sports', 'Other', 'Education'],
 'scores': [0.967433512210846, 0.025695420801639557, 0.006871006917208433]},
 {'sequence': 'What is a transformer? It is a black box neural network model which can be used to do stuff with sequences',
 'labels': ['Other', 'Education', 'Sports'],
 'scores': [0.776347279548645, 0.11728236079216003, 0.10637037456035614]},
 {'sequence': 'How can one understand the meaning of life? It is not so simple',
 'labels': ['Other', 'Education', 'Sports'],
 'scores': [0.8647233247756958, 0.08910410851240158, 0.046172577887773514]},
 {'sequence': 'Shaun had a great insight right in the middle of a surgery',
 'labels': ['Other', 'Sports', 'Education'],
 'scores': [0.7419394850730896, 0.18247079849243164, 0.07558975368738174]}]

## Text Generation

### Using default model

In [8]:
generator = pipeline(task = "text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [9]:
seed_text = "Dhoni finishes off in style and the entire Indian team"

In [11]:
generator(text_inputs = seed_text)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Dhoni finishes off in style and the entire Indian team look forward to meeting him at home to continue their efforts towards an unbeaten run in this World Cup.'}]

In [12]:
generator(text_inputs = seed_text, num_return_sequences = 3, max_length = 30)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Dhoni finishes off in style and the entire Indian team is delighted with his victory\n\nIndia have failed to impress Pakistan's Ranji Trophy winner"},
 {'generated_text': "Dhoni finishes off in style and the entire Indian team goes to great lengths to make him comfortable. It's a very important decision for the first"},
 {'generated_text': 'Dhoni finishes off in style and the entire Indian team is immediately in a good position to secure victory.\n\nA few weeks from now,'}]

### Using specific model from huggingface hub

In [13]:
generator = pipeline("text-generation", model = "distilgpt2")

generator(text_inputs= seed_text, num_return_sequences = 3, max_length = 30)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Dhoni finishes off in style and the entire Indian team has their legs.\n\n\nThe match between the West Indian and the Americans was the'},
 {'generated_text': 'Dhoni finishes off in style and the entire Indian team is preparing to compete on October 31st.\n\nThe squad of India is made up'},
 {'generated_text': 'Dhoni finishes off in style and the entire Indian team looks happy to be back as usual this term," he added.'}]

## Mask Filling

In [14]:
filler = pipeline("fill-mask")

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [16]:
filler("How deep is your ?", top_k = 5)

[{'score': 0.07598453760147095,
 'token': 6943,
 'token_str': ' depression',
 'sequence': 'How deep is your depression?'},
 {'score': 0.035246096551418304,
 'token': 12172,
 'token_str': ' bubble',
 'sequence': 'How deep is your bubble?'},
 {'score': 0.027820784598588943,
 'token': 7530,
 'token_str': ' addiction',
 'sequence': 'How deep is your addiction?'},
 {'score': 0.014877567999064922,
 'token': 4683,
 'token_str': ' hole',
 'sequence': 'How deep is your hole?'},
 {'score': 0.013593271374702454,
 'token': 1144,
 'token_str': ' heart',
 'sequence': 'How deep is your heart?'}]

In [17]:
filler = pipeline("fill-mask", model = "bert-base-cased")

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [18]:
filler("How deep is your [MASK]?", top_k = 5)

[{'score': 0.0551474466919899,
 'token': 1762,
 'token_str': 'heart',
 'sequence': 'How deep is your heart?'},
 {'score': 0.04252220690250397,
 'token': 5785,
 'token_str': 'wound',
 'sequence': 'How deep is your wound?'},
 {'score': 0.038988541811704636,
 'token': 3960,
 'token_str': 'soul',
 'sequence': 'How deep is your soul?'},
 {'score': 0.03589598089456558,
 'token': 2922,
 'token_str': 'throat',
 'sequence': 'How deep is your throat?'},
 {'score': 0.0302369873970747,
 'token': 1567,
 'token_str': 'love',
 'sequence': 'How deep is your love?'}]

## Named Entity Recognition (NER)

In [19]:
ner = pipeline(task = "ner", grouped_entities = True)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [20]:
ner("Hey everyone, please welcome, the chief guest for tonight: Mr. Sachin Tendulkar from the Indian Cricket Team")

[{'entity_group': 'PER',
 'score': 0.9884488,
 'word': 'Sachin Tendulkar',
 'start': 63,
 'end': 79},
 {'entity_group': 'ORG',
 'score': 0.9564063,
 'word': 'Indian Cricket Team',
 'start': 89,
 'end': 108}]

In [21]:
ner = pipeline(task = "ner", grouped_entities = False)
ner("Hey everyone, please welcome, the chief guest for tonight: Mr. Sachin Tendulkar from the Indian Cricket Team")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'entity': 'I-PER',
 'score': 0.9995166,
 'index': 15,
 'word': 'Sa',
 'start': 63,
 'end': 65},
 {'entity': 'I-PER',
 'score': 0.9992397,
 'index': 16,
 'word': '##chin',
 'start': 65,
 'end': 69},
 {'entity': 'I-PER',
 'score': 0.99916065,
 'index': 17,
 'word': 'Ten',
 'start': 70,
 'end': 73},
 {'entity': 'I-PER',
 'score': 0.9957129,
 'index': 18,
 'word': '##du',
 'start': 73,
 'end': 75},
 {'entity': 'I-PER',
 'score': 0.9410511,
 'index': 19,
 'word': '##lk',
 'start': 75,
 'end': 77},
 {'entity': 'I-PER',
 'score': 0.99601185,
 'index': 20,
 'word': '##ar',
 'start': 77,
 'end': 79},
 {'entity': 'I-ORG',
 'score': 0.9637556,
 'index': 23,
 'word': 'Indian',
 'start': 89,
 'end': 95},
 {'entity': 'I-ORG',
 'score': 0.9248884,
 'index': 24,
 'word': 'Cricket',
 'start': 96,
 'end': 103},
 {'entity': 'I-ORG',
 'score': 0.98057497,
 'index': 25,
 'word': 'Team',
 'start': 104,
 'end': 108}]

In [25]:
pos = pipeline(task = "token-classification")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [28]:
pos("My name is Sylvain and I work at Hugging Face in Brooklyn.")

[{'entity': 'I-PER',
 'score': 0.99938285,
 'index': 4,
 'word': 'S',
 'start': 11,
 'end': 12},
 {'entity': 'I-PER',
 'score': 0.99815494,
 'index': 5,
 'word': '##yl',
 'start': 12,
 'end': 14},
 {'entity': 'I-PER',
 'score': 0.9959072,
 'index': 6,
 'word': '##va',
 'start': 14,
 'end': 16},
 {'entity': 'I-PER',
 'score': 0.99923277,
 'index': 7,
 'word': '##in',
 'start': 16,
 'end': 18},
 {'entity': 'I-ORG',
 'score': 0.9738931,
 'index': 12,
 'word': 'Hu',
 'start': 33,
 'end': 35},
 {'entity': 'I-ORG',
 'score': 0.97611505,
 'index': 13,
 'word': '##gging',
 'start': 35,
 'end': 40},
 {'entity': 'I-ORG',
 'score': 0.9887976,
 'index': 14,
 'word': 'Face',
 'start': 41,
 'end': 45},
 {'entity': 'I-LOC',
 'score': 0.9932106,
 'index': 16,
 'word': 'Brooklyn',
 'start': 49,
 'end': 57}]

## Question Answering

In [30]:
bot = pipeline("question-answering")
bot(
 question = "How am I doing?",
 context = "I have just came back from a very busy trip and I wish I could get some rest."
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.21678458154201508,
 'start': 48,
 'end': 76,
 'answer': 'I wish I could get some rest'}

This is a model which is meant to extract the phrases from the given text which could be the answer and does not generate the answer.

## Summarization

In [31]:
summary = pipeline("summarization")

summary(
"""
 America has changed dramatically during recent years. Not only has the number of 
 graduates in traditional engineering disciplines such as mechanical, civil, 
 electrical, chemical, and aeronautical engineering declined, but in most of 
 the premier American universities engineering curricula now concentrate on 
 and encourage largely the study of engineering science. As a result, there 
 are declining offerings in engineering subjects dealing with infrastructure, 
 the environment, and related issues, and greater concentration on high 
 technology subjects, largely supporting increasingly complex scientific 
 developments. While the latter is important, it should not be at the expense 
 of more traditional engineering.

 Rapidly developing economies such as China and India, as well as other 
 industrial countries in Europe and Asia, continue to encourage and advance 
 the teaching of engineering. Both China and India, respectively, graduate 
 six and eight times as many traditional engineers as does the United States. 
 Other industrial countries at minimum maintain their output, while America 
 suffers an increasingly serious decline in the number of engineering graduates 
 and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil, electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

## Translation

In [34]:
translator = pipeline("translation", model = "HariSekhar/Eng_Marathi_translation")

KeyError: 'translation'

In [None]:
translator("")