jasonmcaffee commited on
Commit
77aa390
·
1 Parent(s): 03563b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -1
README.md CHANGED
@@ -4,6 +4,8 @@ license: mit
4
  # Overview
5
  This model has been fine tuned for text summary creation, and was created using LoRA to fine tunethe flan-t5-large model using the [SAMsum training dataset](https://huggingface.co/datasets/samsum).
6
 
 
 
7
  ## SAMsum
8
  SAMsum is a corpus comprised of 16k dialogues and corresponding summaries.
9
 
@@ -163,4 +165,62 @@ from google.colab import files
163
  files.download("/content/flan-t5-large-samsum.zip")
164
  ```
165
 
166
- Upload the contents of that zip file to huggingface
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  # Overview
5
  This model has been fine tuned for text summary creation, and was created using LoRA to fine tunethe flan-t5-large model using the [SAMsum training dataset](https://huggingface.co/datasets/samsum).
6
 
7
+ This document will explain how the model was fine tuned, saved to disk, added to Hugging Face, and then demonstrate how it is used.
8
+
9
  ## SAMsum
10
  SAMsum is a corpus comprised of 16k dialogues and corresponding summaries.
11
 
 
165
  files.download("/content/flan-t5-large-samsum.zip")
166
  ```
167
 
168
+ Upload the contents of that zip file to huggingface
169
+
170
+ # Code to utilize the fine tuned model
171
+
172
+ ## Notebook Source
173
+ [Notebook using the Hugging Face hosted moded](https://colab.research.google.com/drive/1kqADOA9vaTsdecx4u-7XWJJia62WV0cY?pli=1#scrollTo=KMs70mdIxaam)
174
+
175
+ ## Load the model, tokenizer, and LoRA adapter (PEFT)
176
+
177
+ ```
178
+ # Load the jasonmcaffee/flan-t5-large-samsum model and tokenizer
179
+ import torch
180
+ from peft import PeftModel, PeftConfig
181
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
182
+
183
+ peft_model_id = "jasonmcaffee/flan-t5-large-samsum"
184
+ config = PeftConfig.from_pretrained(peft_model_id)
185
+
186
+ model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, load_in_8bit=True, device_map="auto")
187
+ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
188
+
189
+ # Load the LoRA adapter
190
+ model = PeftModel.from_pretrained(model, peft_model_id, device_map="auto")
191
+ model.eval()
192
+ ```
193
+
194
+ ## Have the model summarize text!
195
+ Finally, we now have a model that is capable of summarizing text for us.
196
+
197
+ Summarization takes ~30 seconds.
198
+ ```
199
+ dialogue = """The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.
200
+ """
201
+
202
+ input_ids = tokenizer(dialogue, return_tensors="pt", truncation=True).input_ids.cuda()
203
+ # with torch.inference_mode():
204
+ outputs = model.generate(
205
+ input_ids=input_ids,
206
+ min_length=20,
207
+ max_new_tokens=100,
208
+ length_penalty=1.9, #Exponential penalty to the length that is used with beam-based generation. It is applied as an exponent to the sequence length, which in turn is used to divide the score of the sequence. Since the score is the log likelihood of the sequence (i.e. negative), length_penalty > 0.0 promotes longer sequences, while length_penalty < 0.0 encourages shorter sequences.
209
+ num_beams=4,
210
+ temperature=0.9,
211
+ top_k=150, # default 50
212
+ repetition_penalty=2.1,
213
+ # do_sample=True,
214
+ top_p=0.9,
215
+ )
216
+ print(f"input sentence: {dialogue}\n{'---'* 20}")
217
+
218
+ summarization = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
219
+
220
+ print(f"summary:\n{summarization}")
221
+ ```
222
+ Prints:
223
+ > The Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.
224
+
225
+ The notebook also loads the flan-t5 with no SAMsum training, which produces a summary of:
226
+ > The Eiffel Tower is the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930.