giuid
/

flan_t5_large_summarization_v2

Text2Text Generation

Model card Files Files and versions Community

flan_t5_large_summarization_v2 / README.md

giuid's picture

Update README.md

13400f2 verified 4 months ago

|

history blame contribute delete

1.86 kB

	---
	language: en
	datasets:
	- efra
	license: apache-2.0
	tags:
	- summarization
	- flan-t5
	- legal
	- food
	model_type: t5
	pipeline_tag: text2text-generation
	---

	# Flan-T5 Large Fine-Tuned on EFRA Dataset

	This is a fine-tuned version of [Flan-T5 Large](https://huggingface.co/google/flan-t5-large) on the EFRA dataset for summarizing legal documents related to food regulations and policies.

	## Model Description

	Flan-T5 is a sequence-to-sequence model trained for text-to-text tasks. This fine-tuned version is specifically optimized for summarizing legal text in the domain of food legislation, regulatory requirements, and compliance documents.

	### Fine-Tuning Details
	- Base Model: [google/flan-t5-large](https://huggingface.co/google/flan-t5-large)
	- Dataset: EFRA (a curated dataset of legal documents in the food domain)
	- Objective: Summarization of legal documents
	- Framework: Hugging Face Transformers

	## Applications

	This model is suitable for:
	- Summarizing legal texts in the food domain
	- Extracting key information from lengthy regulatory documents
	- Assisting legal professionals and food companies in understanding compliance requirements

	## Example Usage

	```python
	from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

	# Load the model and tokenizer
	model = AutoModelForSeq2SeqLM.from_pretrained("giuid/flan_t5_large_summarization_v2")
	tokenizer = AutoTokenizer.from_pretrained("giuid/flan_t5_large_summarization_v2")

	# Input text
	input_text = "Your lengthy legal document text here..."

	# Tokenize and generate summary
	inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)
	outputs = model.generate(inputs.input_ids, max_length=150, num_beams=5, early_stopping=True)

	# Decode summary
	summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(summary)