|
--- |
|
license: |
|
- apache-2.0 |
|
- bsd-3-clause |
|
tags: |
|
- summarization |
|
- summary |
|
- booksum |
|
- long-document |
|
- long-form |
|
- tglobal-xl |
|
- XL |
|
datasets: |
|
- kmfoda/booksum |
|
metrics: |
|
- rouge |
|
inference: false |
|
model-index: |
|
- name: pszemraj/long-t5-tglobal-xl-16384-book-summary |
|
results: |
|
- task: |
|
type: summarization |
|
name: Summarization |
|
dataset: |
|
name: multi_news |
|
type: multi_news |
|
config: default |
|
split: test |
|
metrics: |
|
- type: rouge |
|
value: 36.2043 |
|
name: ROUGE-1 |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzRmMmUyOTVjMmJmZTRiZDcyYzY3MTQ1MmUyNDA5NjVhYzEzYzBiNzcxYTRhMDQ3OTlhMGZjYmJlNDM1M2NjYyIsInZlcnNpb24iOjF9._uArOQ1_0znXDPXMq7unA1OHB-XbgqzzKRbFRcVUzTUJdWk26LiSa2pEEVNNmJPg6Uo7CAvONmhpEswLvl9TAg |
|
- type: rouge |
|
value: 8.424 |
|
name: ROUGE-2 |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzg0MzljYjVjYWQ3MmRkZDBlOGI5M2RiMGU0M2UwZGUzMDg2NTU0NjcwMTNiN2ZmODEzNTQ0MmEwNDA3NDA5MSIsInZlcnNpb24iOjF9.Dzj85ld6TjosQ8KyUdoadzicMLedEFrICC6Q-08O3qx28d9B9Uke1zw-VWabiuesPEDTRGbWuBgPA5vxYWUZAw |
|
- type: rouge |
|
value: 17.3721 |
|
name: ROUGE-L |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNDA3ZjZmODAwMTNlM2RlZmJlMDI5MGVkMGRkMTBjMTYzNDk5ZjFiNTY5MWE1MDUwNWI2MDE4ZDA2YWMwMmI2NCIsInZlcnNpb24iOjF9.MOV_nId0XAK1eMQssG5GN9DsitZaTrxl4jdCJnOg9EZ0-vAw227ln599YV5YfZ1OPJnWwek6rneqqyONiHn9AQ |
|
- type: rouge |
|
value: 32.3994 |
|
name: ROUGE-LSUM |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZmY3MDMwOTZjNWI0YTk1MDgwMzJkYTFiN2U5YWU0Mzc0MWRiMzc1NzZlMDhjMWUwMmY2ODI2MjI5ODBkYWUxOSIsInZlcnNpb24iOjF9._BwGIZbcA4pUBkEAL0cW-JPPta0KSoGug4Z7vogHacUz-AEhIOI5ICUldZh0pt9OK67MpUSzpShJOu3rSt5YDQ |
|
- type: loss |
|
value: 2.0843334197998047 |
|
name: loss |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWFhMmE5ZjA3ODM4YmVjMDMyMjk5YjNlMjA1MGMzOWY0NTRlYzk1YjZiMzQxMDMxOTMwMjFkNTdmNjM1NDcyMyIsInZlcnNpb24iOjF9.3wbXV4CIIgnfXAnnRztdOR12PwsWsEfiglQQ09K-C1EgW4gai4x9l-wTE2OZ7CTWkuk_tr4tL_uqOCXLZRMtCQ |
|
- type: gen_len |
|
value: 248.3572 |
|
name: gen_len |
|
verified: true |
|
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWZhOGMwMDJjNGU2MzA2YzI1OWU1ZDY5N2NjZmM1YTA5NDg1MzUwNmU1YTBhNjQyNWYwYzA3OGNmODFjMmE2NSIsInZlcnNpb24iOjF9.Rc9u89zCdbFnjsnmq65l_JvCtUwOX_ZWapKJpTZ-rC8HxcUVfi2Ash2QfvvvxHH_YWhwklxxdnNa0HCm46qLAA |
|
- task: |
|
type: summarization |
|
name: Summarization |
|
dataset: |
|
name: billsum |
|
type: billsum |
|
config: default |
|
split: test |
|
metrics: |
|
- name: ROUGE-1 |
|
type: rouge |
|
value: 41.3645 |
|
verified: true |
|
- name: ROUGE-2 |
|
type: rouge |
|
value: 16.144 |
|
verified: true |
|
- name: ROUGE-L |
|
type: rouge |
|
value: 24.2981 |
|
verified: true |
|
- name: ROUGE-LSUM |
|
type: rouge |
|
value: 35.3234 |
|
verified: true |
|
- name: loss |
|
type: loss |
|
value: 1.282260775566101 |
|
verified: true |
|
- name: gen_len |
|
type: gen_len |
|
value: 291.8158 |
|
verified: true |
|
- task: |
|
type: summarization |
|
name: Summarization |
|
dataset: |
|
name: ccdv/arxiv-summarization |
|
type: ccdv/arxiv-summarization |
|
config: document |
|
split: test |
|
metrics: |
|
- name: ROUGE-1 |
|
type: rouge |
|
value: 36.3225 |
|
verified: true |
|
- name: ROUGE-2 |
|
type: rouge |
|
value: 9.3743 |
|
verified: true |
|
- name: ROUGE-L |
|
type: rouge |
|
value: 19.8396 |
|
verified: true |
|
- name: ROUGE-LSUM |
|
type: rouge |
|
value: 32.2532 |
|
verified: true |
|
- name: loss |
|
type: loss |
|
value: 2.146871566772461 |
|
verified: true |
|
- name: gen_len |
|
type: gen_len |
|
value: 186.2966 |
|
verified: true |
|
--- |
|
|
|
# long-t5-tglobal-xl + BookSum |
|
|
|
<a href="https://colab.research.google.com/gist/pszemraj/c19e32baf876deb866c31cd46c86e893/long-t5-xl-accelerate-test.ipynb"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |
|
|
|
Summarize long text and get a SparkNotes-like summary of any topic! |
|
|
|
- Generalizes reasonably well to academic & narrative text. |
|
- This is the XL checkpoint, which **produces even better summaries [from a human evaluation perspective](https://long-t5-xl-book-summary-examples.netlify.app/)**. |
|
|
|
A simple example/use case with [the base model](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary) on ASR is [here](https://longt5-booksum-example.netlify.app/). |
|
|
|
## Cheeky Proof-of-Concept |
|
|
|
A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/navy-seal-copypasta): |
|
|
|
> In this chapter, the monster explains how he intends to exact revenge on "the little b\*\*\*\*" who insulted him. He tells the kiddo that he is a highly trained and experienced killer who will use his arsenal of weapons--including his access to the internet--to exact justice on the little brat. |
|
|
|
While this is a crude example, try running this copypasta through other summarization models to see the difference in comprehension (_even though it's not even a "long" text!_). |
|
|
|
* * * |
|
|
|
**Contents** |
|
|
|
<!-- TOC --> |
|
|
|
- [Description](#description) |
|
- [How-To in Python](#how-to-in-python) |
|
- [Beyond the basics](#beyond-the-basics) |
|
- [Adjusting parameters](#adjusting-parameters) |
|
- [LLM.int8 Quantization](#llmint8-quantization) |
|
- [About](#about) |
|
- [Intended uses & limitations](#intended-uses--limitations) |
|
- [Training and evaluation data](#training-and-evaluation-data) |
|
- [Eval results](#eval-results) |
|
- [FAQ](#faq) |
|
- [How can I run inference with this on CPU?](#how-can-i-run-inference-with-this-on-cpu) |
|
- [How to run inference over a very long (30k+ tokens) document in batches?](#how-to-run-inference-over-a-very-long-30k-tokens-document-in-batches) |
|
- [How to fine-tune further?](#how-to-fine-tune-further) |
|
- [Are there simpler ways to run this?](#are-there-simpler-ways-to-run-this) |
|
- [Training procedure](#training-procedure) |
|
- [Updates](#updates) |
|
- [Training hyperparameters](#training-hyperparameters) |
|
- [Framework versions](#framework-versions) |
|
|
|
<!-- /TOC --> |
|
|
|
* * * |
|
|
|
## Description |
|
|
|
A fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the `kmfoda/booksum` dataset. |
|
|
|
Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf) |
|
|
|
## How-To in Python |
|
|
|
install/update transformers `pip install -U transformers` |
|
|
|
summarize text with pipeline: |
|
|
|
```python |
|
import torch |
|
from transformers import pipeline |
|
|
|
summarizer = pipeline( |
|
"summarization", |
|
"pszemraj/long-t5-tglobal-xl-16384-book-summary", |
|
device=0 if torch.cuda.is_available() else -1, |
|
) |
|
long_text = "Here is a lot of text I don't want to read. Replace me" |
|
|
|
result = summarizer(long_text) |
|
print(result[0]["summary_text"]) |
|
``` |
|
|
|
### Beyond the basics |
|
|
|
There are two additional points to consider beyond simple inference: adjusting decoding parameters for improved performance, and quantization for reduced memory consumption. |
|
|
|
#### Adjusting parameters |
|
|
|
Pass [other parameters related to beam search textgen](https://huggingface.co/blog/how-to-generate) when calling `summarizer` to get even higher quality results. |
|
|
|
#### LLM.int8 Quantization |
|
|
|
> alternative section title: how to get this monster to run inference on free colab runtimes |
|
|
|
Via [this PR](https://github.com/huggingface/transformers/pull/20341) LLM.int8 is now supported for `long-t5` models. |
|
|
|
- per **initial tests** the summarization quality seems to hold while using _significantly_ less memory! \* |
|
- a version of this model quantized to int8 is [already on the hub here](https://huggingface.co/pszemraj/long-t5-tglobal-xl-16384-book-summary-8bit) so if you're using the 8-bit version anyway, you can start there for a 3.5 gb download only! |
|
|
|
First, make sure you have the latest versions of the relevant packages: |
|
```bash |
|
pip install -U transformers bitsandbytes accelerate |
|
``` |
|
|
|
load in 8-bit (_magic completed by `bitsandbytes` behind the scenes_) |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
"pszemraj/long-t5-tglobal-xl-16384-book-summary" |
|
) |
|
|
|
model = AutoModelForSeq2SeqLM.from_pretrained( |
|
"pszemraj/long-t5-tglobal-xl-16384-book-summary", |
|
load_in_8bit=True, |
|
device_map="auto", |
|
) |
|
``` |
|
|
|
The above is already present in the Colab demo linked at the top of the model card. |
|
|
|
\* More rigorous metrics-based research comparing beam-search summarization with and without LLM.int8 will take place over time. |
|
|
|
* * * |
|
|
|
## About |
|
|
|
### Intended uses & limitations |
|
|
|
While this model seems to improve factual consistency, **don't take summaries as foolproof and check things that seem odd**. |
|
|
|
Specifically: negation statements (i.e., the model says: _this thing does not have [ATTRIBUTE]_, when instead it should have said _this thing has lots of [ATTRIBUTE]_). |
|
|
|
- I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually check this by comparing a particular statement with what the surrounding sentences imply. |
|
|
|
### Training and evaluation data |
|
|
|
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). |
|
|
|
- For **initial fine-tuning**, only input text with 12288 input tokens or less and 1024 output tokens or less was used (_i.e. lines longer than that were dropped before training_) for memory reasons. After a quick analysis, summaries in the 12288-16384 range are in the **small** minority in this dataset. |
|
- In addition, this initial training combined the training and validation sets and trained on them in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should (always) be the test set.**. |
|
- The **final stages of fine-tuning** used the standard 16384 input/1024 output conventions, preserving the standard in/out lengths (_and truncating longer sequences_). This did not seem to change the loss/performance much. |
|
|
|
### Eval results |
|
|
|
Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here. |
|
|
|
**Please read the note above, as due to the training methods, the performance on the validation set looks better than the results on the test set will be**. The model achieves the following results on the evaluation set: |
|
|
|
- eval_loss: 1.2756 |
|
- eval_rouge1: 41.8013 |
|
- eval_rouge2: 12.0895 |
|
- eval_rougeL: 21.6007 |
|
- eval_rougeLsum: 39.5382 |
|
- eval_gen_len: 387.2945 |
|
- eval_runtime: 13908.4995 |
|
- eval_samples_per_second: 0.107 |
|
- eval_steps_per_second: 0.027 |
|
|
|
|
|
***** predict/test metrics (initial) ***** |
|
predict_gen_len = 506.4368 |
|
predict_loss = 2.028 |
|
predict_rouge1 = 36.8815 |
|
predict_rouge2 = 8.0625 |
|
predict_rougeL = 17.6161 |
|
predict_rougeLsum = 34.9068 |
|
predict_runtime = 2:04:14.37 |
|
predict_samples = 1431 |
|
predict_samples_per_second = 0.192 |
|
predict_steps_per_second = 0.048 |
|
|
|
\* evaluating big model not as easy as it seems. Doing a bit more investigating |
|
|
|
* * * |
|
|
|
## FAQ |
|
|
|
### How can I run inference with this on CPU? |
|
|
|
lol |
|
|
|
### How to run inference over a very long (30k+ tokens) document in batches? |
|
|
|
See `summarize.py` in [the code for my hf space Document Summarization](https://huggingface.co/spaces/pszemraj/document-summarization/blob/main/summarize.py) :) |
|
|
|
You can also use the same code to split a document into batches of 4096, etc., and iterate over them with the model. This is useful in situations where CUDA memory is limited. |
|
|
|
**Update:** see the section on the `textsum` package below. |
|
|
|
### How to fine-tune further? |
|
|
|
See [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) |
|
|
|
### Are there simpler ways to run this? |
|
|
|
For this reason, I created a Python package utility. It's called [textsum](https://github.com/pszemraj/textsum), and you can use it to load models and summarize things in a few lines of code. |
|
|
|
```sh |
|
pip install textsum |
|
``` |
|
|
|
Use `textsum` in python with this model: |
|
|
|
```python |
|
from textsum.summarize import Summarizer |
|
|
|
summarizer = Summarizer( |
|
model_name_or_path="pszemraj/long-t5-tglobal-xl-16384-book-summary" |
|
) |
|
|
|
long_string = "This is a long string of text that will be summarized." |
|
out_str = summarizer.summarize_string(long_string) |
|
print(f"summary: {out_str}") |
|
``` |
|
|
|
This package provides easy-to-use interfaces for applying summarization models to text documents of arbitrary length. Currently implemented interfaces include a Python API, a CLI, and a shareable demo application. |
|
|
|
For details, explanations, and documentation, see the README (_linked above_) or the [wiki](https://github.com/pszemraj/textsum/wiki). |
|
|
|
* * * |
|
|
|
## Training procedure |
|
|
|
### Updates |
|
|
|
Updates to this model/model card will be posted here when relevant. The model seems to be fairly converged; if updates/improvements are possible using the `BookSum` dataset, this repo will be updated. |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
|
|
- learning_rate: 0.0006 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 1 |
|
- seed: 10350 |
|
- distributed_type: multi-GPU |
|
- num_devices: 4 |
|
- gradient_accumulation_steps: 32 |
|
- total_train_batch_size: 128 |
|
- total_eval_batch_size: 4 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: constant |
|
- num_epochs: 1.0 |
|
|
|
\*_Prior training sessions used roughly similar parameters (learning rates were higher); multiple sessions were required as this takes eons to train._ |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.25.0.dev0 |
|
- Pytorch 1.13.0+cu117 |
|
- Datasets 2.6.1 |
|
- Tokenizers 0.13.1 |
|
|
|
* * * |
|
|