Maximum number of input tokens

#83
by sushant-m-nair - opened

Hi,
Can someone please tell what is the maximum number of tokens that can be input to this model?
Thanks.

i think it's 1024

Is there any parameter through which we can find the maximum length of input text for the model which we are using?

from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
print(model.config.max_position_embeddings) # => 1024

So YES 1024 is max input tokens..

To check your input text token length:

from transformers import AutoTokenizer

text_to_summarize = "Put here your long text..."
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
tokens = tokenizer.encode(text_to_summarize, truncation=False)
print(f"My long input text has: {len(tokens)} tokens")

Slice the input text to specific token length - not count of characters nor count words

from transformers import pipeline
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer


model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
max_input_tokens = model.config.max_position_embeddings

text_to_summary = "Your long text..."

tokens = tokenizer.encode(text_to_summary, truncation=False)

if len(tokens) > max_input_tokens:
    tokens = tokens[:max_input_tokens-1]
    text_to_summary = tokenizer.decode(tokens, skip_special_tokens=True)

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
result = summarizer(text_to_summary, max_length=150, min_length=10, do_sample=False)
print(result[0]['summary_text'])
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment