update README to be sexy
Browse files
README.md
CHANGED
@@ -132,8 +132,9 @@ model-index:
|
|
132 |
</a>
|
133 |
|
134 |
Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
|
135 |
-
|
136 |
-
-
|
|
|
137 |
|
138 |
A simple example/use case with [the base model](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary) on ASR is [here](https://longt5-booksum-example.netlify.app/).
|
139 |
|
@@ -141,17 +142,41 @@ A simple example/use case with [the base model](https://huggingface.co/pszemraj/
|
|
141 |
|
142 |
A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/navy-seal-copypasta):
|
143 |
|
144 |
-
> In this chapter, the monster explains how he intends to exact revenge on "the little b
|
145 |
|
146 |
While a somewhat crude example, try running this copypasta through other summarization models to see the difference in comprehension (_despite it not even being a "long" text!_)
|
147 |
|
148 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
149 |
|
150 |
## Description
|
151 |
|
152 |
A fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the `kmfoda/booksum` dataset.
|
153 |
|
154 |
-
Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
|
155 |
|
156 |
## How-To in Python
|
157 |
|
@@ -173,9 +198,10 @@ long_text = "Here is a lot of text I don't want to read. Replace me"
|
|
173 |
result = summarizer(long_text)
|
174 |
print(result[0]["summary_text"])
|
175 |
```
|
|
|
176 |
### Beyond the basics
|
177 |
|
178 |
-
There are two additional points to consider beyond simple inference: adjusting decoding parameters for improved performance, and quantization for decreased memory devouring.
|
179 |
|
180 |
#### Adjusting parameters
|
181 |
|
@@ -189,7 +215,6 @@ Per [this PR](https://github.com/huggingface/transformers/pull/20341) LLM.int8 i
|
|
189 |
|
190 |
How-to: essentially ensure you have pip installed from the **latest GitHub repo main** version of `transformers`, and `bitsandbytes`
|
191 |
|
192 |
-
|
193 |
install the latest `main` branch:
|
194 |
|
195 |
```bash
|
@@ -217,10 +242,9 @@ The above is already present in the Colab demo linked at the top of the model ca
|
|
217 |
|
218 |
Do you love to ask questions? Awesome. But first, check out the [how LLM.int8 works blog post](https://huggingface.co/blog/hf-bitsandbytes-integration) by huggingface.
|
219 |
|
220 |
-
\* More rigorous metric-based investigation into comparing beam-search summarization with and without LLM.int8 will take place over time.
|
221 |
-
|
222 |
|
223 |
-
|
224 |
|
225 |
## About
|
226 |
|
@@ -229,47 +253,49 @@ Do you love to ask questions? Awesome. But first, check out the [how LLM.int8 wo
|
|
229 |
While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
230 |
|
231 |
Specifically: negation statements (i.e., model says: _This thing does not have [ATTRIBUTE]_ where instead it should have said _This thing has a lot of [ATTRIBUTE]_).
|
232 |
-
|
|
|
233 |
|
234 |
### Training and evaluation data
|
235 |
|
236 |
-
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
|
237 |
|
238 |
-
-
|
239 |
-
|
240 |
-
-
|
241 |
|
242 |
### Eval results
|
243 |
|
244 |
Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
|
245 |
|
246 |
**Please read the note above as due to training methods, validation set performance looks better than the test set results will be**. The model achieves the following results on the evaluation set:
|
247 |
-
- eval_loss: 1.2756
|
248 |
-
- eval_rouge1: 41.8013
|
249 |
-
- eval_rouge2: 12.0895
|
250 |
-
- eval_rougeL: 21.6007
|
251 |
-
- eval_rougeLsum: 39.5382
|
252 |
-
- eval_gen_len: 387.2945
|
253 |
-
- eval_runtime: 13908.4995
|
254 |
-
- eval_samples_per_second: 0.107
|
255 |
-
- eval_steps_per_second: 0.027
|
256 |
|
257 |
-
|
258 |
-
|
259 |
-
|
260 |
-
|
261 |
-
|
262 |
-
|
263 |
-
|
264 |
-
|
265 |
-
|
266 |
-
|
267 |
-
|
268 |
-
|
269 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
270 |
\* evaluating big model not as easy as it seems. Doing a bit more investigating
|
271 |
|
272 |
-
|
273 |
|
274 |
## FAQ
|
275 |
|
@@ -287,8 +313,7 @@ You can also use the same code to split a document into batches of 4096, etc., a
|
|
287 |
|
288 |
See [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization)
|
289 |
|
290 |
-
|
291 |
-
---
|
292 |
|
293 |
## Training procedure
|
294 |
|
@@ -299,26 +324,27 @@ Updates to this model/model card will be posted here as relevant. The model seem
|
|
299 |
### Training hyperparameters
|
300 |
|
301 |
The following hyperparameters were used during training:
|
302 |
-
|
303 |
-
-
|
304 |
-
-
|
305 |
-
-
|
306 |
-
-
|
307 |
-
-
|
308 |
-
-
|
309 |
-
-
|
310 |
-
-
|
311 |
-
-
|
312 |
-
-
|
313 |
-
-
|
|
|
314 |
|
315 |
\*_Prior training sessions used roughly similar parameters (learning rates were higher); multiple sessions were required as this takes eons to train._
|
316 |
|
317 |
### Framework versions
|
318 |
|
319 |
-
-
|
320 |
-
-
|
321 |
-
-
|
322 |
-
-
|
323 |
|
324 |
-
|
|
|
132 |
</a>
|
133 |
|
134 |
Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
|
135 |
+
|
136 |
+
- Generalizes reasonably well to academic & narrative text.
|
137 |
+
- This is the XL checkpoint, which **from a human-evaluation perspective, [produces even better summaries](https://long-t5-xl-book-summary-examples.netlify.app/)**.
|
138 |
|
139 |
A simple example/use case with [the base model](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary) on ASR is [here](https://longt5-booksum-example.netlify.app/).
|
140 |
|
|
|
142 |
|
143 |
A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/navy-seal-copypasta):
|
144 |
|
145 |
+
> In this chapter, the monster explains how he intends to exact revenge on "the little b\*\*\*\*" who insulted him. He tells the kiddo that he is a highly trained and experienced killer who will use his arsenal of weapons--including his access to the internet--to exact justice on the little brat.
|
146 |
|
147 |
While a somewhat crude example, try running this copypasta through other summarization models to see the difference in comprehension (_despite it not even being a "long" text!_)
|
148 |
|
149 |
+
* * *
|
150 |
+
|
151 |
+
**Contents**
|
152 |
+
|
153 |
+
<!-- TOC -->
|
154 |
+
|
155 |
+
- [Description](#description)
|
156 |
+
- [How-To in Python](#how-to-in-python)
|
157 |
+
- [Beyond the basics](#beyond-the-basics)
|
158 |
+
- [About](#about)
|
159 |
+
- [Intended uses & limitations](#intended-uses--limitations)
|
160 |
+
- [Training and evaluation data](#training-and-evaluation-data)
|
161 |
+
- [Eval results](#eval-results)
|
162 |
+
- [FAQ](#faq)
|
163 |
+
- [How can I run inference with this on CPU?](#how-can-i-run-inference-with-this-on-cpu)
|
164 |
+
- [How to run inference over a very long (30k+ tokens) document in batches?](#how-to-run-inference-over-a-very-long-30k-tokens-document-in-batches)
|
165 |
+
- [How to fine-tune further?](#how-to-fine-tune-further)
|
166 |
+
- [Training procedure](#training-procedure)
|
167 |
+
- [Updates](#updates)
|
168 |
+
- [Training hyperparameters](#training-hyperparameters)
|
169 |
+
- [Framework versions](#framework-versions)
|
170 |
+
|
171 |
+
<!-- /TOC -->
|
172 |
+
|
173 |
+
* * *
|
174 |
|
175 |
## Description
|
176 |
|
177 |
A fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the `kmfoda/booksum` dataset.
|
178 |
|
179 |
+
Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
|
180 |
|
181 |
## How-To in Python
|
182 |
|
|
|
198 |
result = summarizer(long_text)
|
199 |
print(result[0]["summary_text"])
|
200 |
```
|
201 |
+
|
202 |
### Beyond the basics
|
203 |
|
204 |
+
There are two additional points to consider beyond simple inference: adjusting decoding parameters for improved performance, and quantization for decreased memory devouring.
|
205 |
|
206 |
#### Adjusting parameters
|
207 |
|
|
|
215 |
|
216 |
How-to: essentially ensure you have pip installed from the **latest GitHub repo main** version of `transformers`, and `bitsandbytes`
|
217 |
|
|
|
218 |
install the latest `main` branch:
|
219 |
|
220 |
```bash
|
|
|
242 |
|
243 |
Do you love to ask questions? Awesome. But first, check out the [how LLM.int8 works blog post](https://huggingface.co/blog/hf-bitsandbytes-integration) by huggingface.
|
244 |
|
245 |
+
\* More rigorous metric-based investigation into comparing beam-search summarization with and without LLM.int8 will take place over time.
|
|
|
246 |
|
247 |
+
* * *
|
248 |
|
249 |
## About
|
250 |
|
|
|
253 |
While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
254 |
|
255 |
Specifically: negation statements (i.e., model says: _This thing does not have [ATTRIBUTE]_ where instead it should have said _This thing has a lot of [ATTRIBUTE]_).
|
256 |
+
|
257 |
+
- I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by comparing a specific claim to what the surrounding sentences imply.
|
258 |
|
259 |
### Training and evaluation data
|
260 |
|
261 |
+
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
|
262 |
|
263 |
+
- **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less (_i.e. rows with longer were dropped before training_) for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
|
264 |
+
- In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
|
265 |
+
- **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
|
266 |
|
267 |
### Eval results
|
268 |
|
269 |
Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
|
270 |
|
271 |
**Please read the note above as due to training methods, validation set performance looks better than the test set results will be**. The model achieves the following results on the evaluation set:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
272 |
|
273 |
+
- eval_loss: 1.2756
|
274 |
+
- eval_rouge1: 41.8013
|
275 |
+
- eval_rouge2: 12.0895
|
276 |
+
- eval_rougeL: 21.6007
|
277 |
+
- eval_rougeLsum: 39.5382
|
278 |
+
- eval_gen_len: 387.2945
|
279 |
+
- eval_runtime: 13908.4995
|
280 |
+
- eval_samples_per_second: 0.107
|
281 |
+
- eval_steps_per_second: 0.027
|
282 |
+
|
283 |
+
|
284 |
+
***** predict/test metrics (initial) *****
|
285 |
+
predict_gen_len = 506.4368
|
286 |
+
predict_loss = 2.028
|
287 |
+
predict_rouge1 = 36.8815
|
288 |
+
predict_rouge2 = 8.0625
|
289 |
+
predict_rougeL = 17.6161
|
290 |
+
predict_rougeLsum = 34.9068
|
291 |
+
predict_runtime = 2:04:14.37
|
292 |
+
predict_samples = 1431
|
293 |
+
predict_samples_per_second = 0.192
|
294 |
+
predict_steps_per_second = 0.048
|
295 |
+
|
296 |
\* evaluating big model not as easy as it seems. Doing a bit more investigating
|
297 |
|
298 |
+
* * *
|
299 |
|
300 |
## FAQ
|
301 |
|
|
|
313 |
|
314 |
See [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization)
|
315 |
|
316 |
+
* * *
|
|
|
317 |
|
318 |
## Training procedure
|
319 |
|
|
|
324 |
### Training hyperparameters
|
325 |
|
326 |
The following hyperparameters were used during training:
|
327 |
+
|
328 |
+
- learning_rate: 0.0006
|
329 |
+
- train_batch_size: 1
|
330 |
+
- eval_batch_size: 1
|
331 |
+
- seed: 10350
|
332 |
+
- distributed_type: multi-GPU
|
333 |
+
- num_devices: 4
|
334 |
+
- gradient_accumulation_steps: 32
|
335 |
+
- total_train_batch_size: 128
|
336 |
+
- total_eval_batch_size: 4
|
337 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
338 |
+
- lr_scheduler_type: constant
|
339 |
+
- num_epochs: 1.0
|
340 |
|
341 |
\*_Prior training sessions used roughly similar parameters (learning rates were higher); multiple sessions were required as this takes eons to train._
|
342 |
|
343 |
### Framework versions
|
344 |
|
345 |
+
- Transformers 4.25.0.dev0
|
346 |
+
- Pytorch 1.13.0+cu117
|
347 |
+
- Datasets 2.6.1
|
348 |
+
- Tokenizers 0.13.1
|
349 |
|
350 |
+
* * *
|