|
--- |
|
tags: |
|
- generated_from_trainer |
|
- distilbart |
|
model-index: |
|
- name: distilbart-finetuned-summarization |
|
results: [] |
|
license: apache-2.0 |
|
datasets: |
|
- cnn_dailymail |
|
- xsum |
|
- samsum |
|
- ccdv/pubmed-summarization |
|
language: |
|
- en |
|
metrics: |
|
- rouge |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# distilgpt2-finetuned-finance |
|
|
|
This model is a further fine-tuned version of [distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6) on the the combination of 4 different summarisation datasets: |
|
- [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) |
|
- [samsum](https://huggingface.co/datasets/samsum) |
|
- [xsum](https://huggingface.co/datasets/xsum) |
|
- [ccdv/pubmed-summarization](https://huggingface.co/datasets/ccdv/pubmed-summarization) |
|
|
|
Please check out the offical model page and paper: |
|
- [sshleifer/distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6) |
|
- [Pre-trained Summarization Distillation](https://arxiv.org/abs/2010.13002) |
|
|
|
## Training and evaluation data |
|
|
|
One can reproduce the dataset using the following code: |
|
|
|
```python |
|
from datasets import DatasetDict, load_dataset |
|
from datasets import concatenate_datasets |
|
|
|
xsum_dataset = load_dataset("xsum") |
|
pubmed_dataset = load_dataset("ccdv/pubmed-summarization").rename_column("article", "document").rename_column("abstract", "summary") |
|
cnn_dataset = load_dataset("cnn_dailymail", '3.0.0').rename_column("article", "document").rename_column("highlights", "summary") |
|
samsum_dataset = load_dataset("samsum").rename_column("dialogue", "document") |
|
|
|
summary_train = concatenate_datasets([xsum_dataset["train"], pubmed_dataset["train"], cnn_dataset["train"], samsum_dataset["train"]]) |
|
summary_validation = concatenate_datasets([xsum_dataset["validation"], pubmed_dataset["validation"], cnn_dataset["validation"], samsum_dataset["validation"]]) |
|
summary_test = concatenate_datasets([xsum_dataset["test"], pubmed_dataset["test"], cnn_dataset["test"], samsum_dataset["test"]]) |
|
|
|
raw_datasets = DatasetDict() |
|
raw_datasets["train"] = summary_train |
|
raw_datasets["validation"] = summary_validation |
|
raw_datasets["test"] = summary_test |
|
|
|
``` |
|
|
|
## Inference example |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("text2text-generation", model="lxyuan/distilbart-finetuned-summarization") |
|
|
|
text = """The tower is 324 metres (1,063 ft) tall, about the same height as |
|
an 81-storey building, and the tallest structure in Paris. Its base is square, |
|
measuring 125 metres (410 ft) on each side. During its construction, the |
|
Eiffel Tower surpassed the Washington Monument to become the tallest man-made |
|
structure in the world, a title it held for 41 years until the Chrysler Building |
|
in New York City was finished in 1930. It was the first structure to reach a |
|
height of 300 metres. Due to the addition of a broadcasting aerial at the top |
|
of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres |
|
(17 ft). Excluding transmitters, the Eiffel Tower is the second tallest |
|
free-standing structure in France after the Millau Viaduct. |
|
""" |
|
|
|
pipe(text) |
|
|
|
>>>"""The Eiffel Tower is the tallest man-made structure in the world . |
|
The tower is 324 metres tall, about the same height as an 81-storey building . |
|
Due to the addition of a broadcasting aerial in 1957, it is now taller than |
|
the Chrysler Building by 5.2 metres . |
|
""" |
|
``` |
|
|
|
## Training procedure |
|
|
|
Notebook link: [here](https://github.com/LxYuan0420/nlp/blob/main/notebooks/distilbart-finetune-summarisation.ipynb) |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- evaluation_strategy="epoch", |
|
- save_strategy="epoch", |
|
- logging_strategy="epoch", |
|
- learning_rate=2e-5, |
|
- per_device_train_batch_size=2, |
|
- per_device_eval_batch_size=2, |
|
- gradient_accumulation_steps=64, |
|
- total_train_batch_size: 128 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- weight_decay=0.01, |
|
- save_total_limit=2, |
|
- num_train_epochs=10, |
|
- predict_with_generate=True, |
|
- fp16=True, |
|
- push_to_hub=True |
|
|
|
### Training results |
|
_Training is still in progress_ |
|
|
|
| Epoch | Training Loss | Validation Loss | Rouge1 | Rouge2 | RougeL | RougeLsum | Gen Len | |
|
|-------|---------------|-----------------|--------|--------|--------|-----------|---------| |
|
| 0 | 1.779700 | 1.719054 | 40.0039| 17.9071| 27.8825| 34.8886 | 88.8936 | |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.30.2 |
|
- Pytorch 2.0.1+cu117 |
|
- Datasets 2.13.1 |
|
- Tokenizers 0.13.3 |