Maximum number of input tokens
#83
by
sushant-m-nair
- opened
Hi,
Can someone please tell what is the maximum number of tokens that can be input to this model?
Thanks.
i think it's 1024
Is there any parameter through which we can find the maximum length of input text for the model which we are using?
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
print(model.config.max_position_embeddings) # => 1024
So YES 1024 is max input tokens..
To check your input text token length:
from transformers import AutoTokenizer
text_to_summarize = "Put here your long text..."
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
tokens = tokenizer.encode(text_to_summarize, truncation=False)
print(f"My long input text has: {len(tokens)} tokens")
Slice the input text to specific token length - not count of characters nor count words
from transformers import pipeline
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
max_input_tokens = model.config.max_position_embeddings
text_to_summary = "Your long text..."
tokens = tokenizer.encode(text_to_summary, truncation=False)
if len(tokens) > max_input_tokens:
tokens = tokens[:max_input_tokens-1]
text_to_summary = tokenizer.decode(tokens, skip_special_tokens=True)
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
result = summarizer(text_to_summary, max_length=150, min_length=10, do_sample=False)
print(result[0]['summary_text'])