esuriddick
/

led-base-16384-finetuned-govreport

text2text-generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

esuriddick commited on Aug 22, 2023

Commit

9ce5f5b

•

1 Parent(s): 5e35bbc

Update README.md

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -18,17 +18,19 @@ This model is a fine-tuned version of [allenai/led-base-16384](https://huggingfa
 It achieves the following results on the evaluation set:
 - Loss: 1.2887
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure

 It achieves the following results on the evaluation set:
 - Loss: 1.2887
+The amount of processing time and memory required to assess the ROUGE metrics on the validation and test sets were not supported by Kaggle at this moment in time.
 ## Model description
+As described in [Longformer: The Long-Document Transformer](https://arxiv.org/pdf/2004.05150.pdf) by Iz Beltagy, Matthew E. Peters, Arman Cohan, [Allenai's Longformer Encoder-Decoder (LED)](https://github.com/allenai/longformer#longformer) was initialized from [*bart-base*](https://huggingface.co/facebook/bart-base) since both models share the exact same architecture. To be able to process 16K tokens, *bart-base*'s position embedding matrix was simply copied 16 times.
+This model is especially interesting for long-range summarization and question answering.
+## Intended uses & limitations
+[pszemraj/govreport-summarization-8192](https://huggingface.co/datasets/pszemraj/govreport-summarization-8192) is a pre-processed version of the dataset [ccdv/govreport-summarization](https://huggingface.co/datasets/ccdv/govreport-summarization), which is a dataset for summarization of long documents adapted from this [repository](https://github.com/luyang-huang96/LongDocSum) and this [paper](https://arxiv.org/pdf/2104.02112.pdf).
+The Allenai's LED model was fine-tuned to this dataset, allowing the summarization of documents up to 16384 tokens.
 ## Training procedure