pszemraj
/

long-t5-tglobal-xl-16384-book-summary

@@ -18,9 +18,11 @@ inference: False
 # long-t5-tglobal-xl + BookSum
-- summarize long text and get a SparkNotes-esque summary of arbitrary topics!
-- generalizes reasonably well to academic & narrative text. This is the XL checkpoint, which **from a human-evaluation perspective, produces even better summaries**.
-- A simple example/use case with the `base` model on ASR is [here](https://longt5-booksum-example.netlify.app/).
 ## Model description
@@ -56,13 +58,15 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
 ## Intended uses & limitations
-- while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
-- specifically: negation statements (i.e. model says: _This thing does not have <ATTRIBUTE>_ where instead it should have said _This thing has a lot of <ATTRIBUTE>_).
-  - I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by paying attention to the surrounding sentences of a claim by the model.
 ## Training and evaluation data
-- `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
 - **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
   - In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
 - **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
@@ -71,7 +75,7 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
 Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
-**Please read the note above as due to training methods it looks better than the test set results will be**. The model achieves the following results on the evaluation set:
 - eval_loss: 1.2756
 - eval_rouge1: 41.8013
 - eval_rouge2: 12.0895
@@ -96,7 +100,7 @@ lol
 ### Updates
-Updates to this model/model card will be posted here as relevant. The model seems fairly converged, but if updates/improvements can be made using `kmfoda/booksum`, this repo will be updated.
 ### Training hyperparameters

 # long-t5-tglobal-xl + BookSum
+Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
+- generalizes reasonably well to academic & narrative text.
+- This is the XL checkpoint, which **from a human-evaluation perspective, produces even better summaries**.
+A simple example/use case with the `base` model on ASR is [here](https://longt5-booksum-example.netlify.app/).
 ## Model description
 ## Intended uses & limitations
+While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
+specifically: negation statements (i.e. model says: _This thing does not have <ATTRIBUTE>_ where instead it should have said _This thing has a lot of <ATTRIBUTE>_).
+- I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by paying attention to the surrounding sentences of a claim by the model.
 ## Training and evaluation data
+`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
 - **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
   - In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
 - **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
 Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
+**Please read the note above as due to training methods, it looks better than the test set results will be**. The model achieves the following results on the evaluation set:
 - eval_loss: 1.2756
 - eval_rouge1: 41.8013
 - eval_rouge2: 12.0895
 ### Updates
+Updates to this model/model card will be posted here as relevant. The model seems fairly converged; if updates/improvements are possible using the `BookSum` dataset, this repo will be updated.
 ### Training hyperparameters