--- license: mit language: - en base_model: - google-t5/t5-base datasets: - abisee/cnn_dailymail metrics: - rouge --- # T5-Base-Sum This model is a fine-tuned version of `T5` for summarization tasks. It was finetuned on 25000 training samples from the CNN Dailymail trainset, and is hosted on Hugging Face for easy access and use. This model aspires to deliver precision, factual consistency, and conciseness, driven by a custom cyclic attention mechanism. ## Model Usage Below is an example of how to load and use this model for summarization: ```python from transformers import T5ForConditionalGeneration, T5Tokenizer # Load the model and tokenizer from Hugging Face model = T5ForConditionalGeneration.from_pretrained("Vijayendra/T5-Base-Sum") tokenizer = T5Tokenizer.from_pretrained("Vijayendra/T5-Base-Sum") # Example of using the model for summarization article = """ Videos that say approved vaccines are dangerous and cause autism, cancer or infertility are among those that will be taken down, the company said. The policy includes the termination of accounts of anti-vaccine influencers. Tech giants have been criticised for not doing more to counter false health information on their sites. In July, US PresidentJoe Biden said social media platforms were largely responsible for people's scepticism in getting vaccinated by spreading misinformation, and appealed for them to address the issue. YouTube, which is owned by Google, said 130,000 videos were removed from its platform since last year, when it implemented a ban on content spreading misinformation about Covid vaccines. In a blog post, the company said it had seen false claims about Covid jabs "spill over into misinformation about vaccines in general". The new policy covers long-approved vaccines, such as those against measles or hepatitis B."We're expanding our medical misinformation policies on YouTube with new guidelines on currently administered vaccines that are approved and confirmed to be safe and effective by local health authorities and the WHO," the post said, referring to the World Health Organization. """ inputs = tokenizer.encode("summarize: " + article, return_tensors="pt", max_length=512, truncation=True) summary_ids = model.generate(inputs, max_length=150, min_length=100, length_penalty=2.0, num_beams=4, early_stopping=True) # Decode and print the summary summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) print("Summary:") print(summary) # Example of a random article (can replace this with any article) random_article = """ Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. Leading AI textbooks define the field as the study of "intelligent agents": any system that perceives its environment and takes actions that maximize its chance of achieving its goals. Some popular accounts use the term "artificial intelligence" to describe machines that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem-solving". As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect. A quip in Tesler's Theorem says "AI is whatever hasn't been done yet. """ # Tokenize the input article inputs = tokenizer.encode("summarize: " + random_article, return_tensors="pt", max_length=512, truncation=True) # Generate summary summary_ids = model.generate(inputs, max_length=150, min_length=100, length_penalty=3.0, num_beams=7, early_stopping=False) # Decode and print the summary summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) print("Summary:") print(summary) #Compare with some other models from transformers import T5ForConditionalGeneration, T5Tokenizer, PegasusTokenizer, PegasusForConditionalGeneration, BartForConditionalGeneration, BartTokenizer # Function to summarize with any model def summarize_article(article, model, tokenizer): inputs = tokenizer.encode("summarize: " + article, return_tensors="pt", max_length=512, truncation=True) summary_ids = model.generate(inputs, max_length=150, min_length=100, length_penalty=2.0, num_beams=4, early_stopping=True) summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) return summary # Load our fine-tuned T5 model and tokenizer t5_model_custom = T5ForConditionalGeneration.from_pretrained("Vijayendra/T5-Base-Sum") t5_tokenizer_custom = T5Tokenizer.from_pretrained("Vijayendra/T5-Base-Sum") # Load a different pretrained T5 model for summarization (e.g., "t5-small" fine-tuned on CNN/DailyMail) t5_model_pretrained = T5ForConditionalGeneration.from_pretrained("csebuetnlp/mT5_multilingual_XLSum") t5_tokenizer_pretrained = T5Tokenizer.from_pretrained("csebuetnlp/mT5_multilingual_XLSum") # Load Pegasus model and tokenizer pegasus_model = PegasusForConditionalGeneration.from_pretrained("google/pegasus-xsum") pegasus_tokenizer = PegasusTokenizer.from_pretrained("google/pegasus-xsum") # Load BART model and tokenizer bart_model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn") bart_tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn") # Example article for summarization article = """ Videos that say approved vaccines are dangerous and cause autism, cancer or infertility are among those that will be taken down, the company said. The policy includes the termination of accounts of anti-vaccine influencers. Tech giants have been criticised for not doing more to counter false health information on their sites. In July, US PresidentJoe Biden said social media platforms were largely responsible for people's scepticism in getting vaccinated by spreading misinformation, and appealed for them to address the issue. YouTube, which is owned by Google, said 130,000 videos were removed from its platform since last year, when it implemented a ban on content spreading misinformation about Covid vaccines. In a blog post, the company said it had seen false claims about Covid jabs "spill over into misinformation about vaccines in general". The new policy covers long-approved vaccines, such as those against measles or hepatitis B."We're expanding our medical misinformation policies on YouTube with new guidelines on currently administered vaccines that are approved and confirmed to be safe and effective by local health authorities and the WHO," the post said, referring to the World Health Organization. """ # Summarize with our fine-tuned T5 model t5_summary_custom = summarize_article(article, t5_model_custom, t5_tokenizer_custom) # Summarize with the pretrained T5 model for summarization t5_summary_pretrained = summarize_article(article, t5_model_pretrained, t5_tokenizer_pretrained) # Summarize with Pegasus model pegasus_summary = summarize_article(article, pegasus_model, pegasus_tokenizer) # Summarize with BART model bart_summary = summarize_article(article, bart_model, bart_tokenizer) # Print summaries for comparison print("T5 base with Cyclic Attention Summary:") print(t5_summary_custom) print("\nPretrained mT5_multilingual_XLSum Summary:") print(t5_summary_pretrained) print("\nPegasus Xsum Summary:") print(pegasus_summary) print("\nBART Large CNN Summary:") print(bart_summary)