pszemraj commited on
Commit
09901c4
·
1 Parent(s): 63cf97c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -9
README.md CHANGED
@@ -18,9 +18,11 @@ inference: False
18
 
19
  # long-t5-tglobal-xl + BookSum
20
 
21
- - summarize long text and get a SparkNotes-esque summary of arbitrary topics!
22
- - generalizes reasonably well to academic & narrative text. This is the XL checkpoint, which **from a human-evaluation perspective, produces even better summaries**.
23
- - A simple example/use case with the `base` model on ASR is [here](https://longt5-booksum-example.netlify.app/).
 
 
24
 
25
  ## Model description
26
 
@@ -56,13 +58,15 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
56
 
57
  ## Intended uses & limitations
58
 
59
- - while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
60
- - specifically: negation statements (i.e. model says: _This thing does not have <ATTRIBUTE>_ where instead it should have said _This thing has a lot of <ATTRIBUTE>_).
61
- - I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by paying attention to the surrounding sentences of a claim by the model.
 
62
 
63
  ## Training and evaluation data
64
 
65
- - `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
 
66
  - **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
67
  - In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
68
  - **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
@@ -71,7 +75,7 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
71
 
72
  Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
73
 
74
- **Please read the note above as due to training methods it looks better than the test set results will be**. The model achieves the following results on the evaluation set:
75
  - eval_loss: 1.2756
76
  - eval_rouge1: 41.8013
77
  - eval_rouge2: 12.0895
@@ -96,7 +100,7 @@ lol
96
 
97
  ### Updates
98
 
99
- Updates to this model/model card will be posted here as relevant. The model seems fairly converged, but if updates/improvements can be made using `kmfoda/booksum`, this repo will be updated.
100
 
101
  ### Training hyperparameters
102
 
 
18
 
19
  # long-t5-tglobal-xl + BookSum
20
 
21
+ Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
22
+ - generalizes reasonably well to academic & narrative text.
23
+ - This is the XL checkpoint, which **from a human-evaluation perspective, produces even better summaries**.
24
+
25
+ A simple example/use case with the `base` model on ASR is [here](https://longt5-booksum-example.netlify.app/).
26
 
27
  ## Model description
28
 
 
58
 
59
  ## Intended uses & limitations
60
 
61
+ While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
62
+
63
+ specifically: negation statements (i.e. model says: _This thing does not have <ATTRIBUTE>_ where instead it should have said _This thing has a lot of <ATTRIBUTE>_).
64
+ - I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by paying attention to the surrounding sentences of a claim by the model.
65
 
66
  ## Training and evaluation data
67
 
68
+ `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
69
+
70
  - **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
71
  - In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
72
  - **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
 
75
 
76
  Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
77
 
78
+ **Please read the note above as due to training methods, it looks better than the test set results will be**. The model achieves the following results on the evaluation set:
79
  - eval_loss: 1.2756
80
  - eval_rouge1: 41.8013
81
  - eval_rouge2: 12.0895
 
100
 
101
  ### Updates
102
 
103
+ Updates to this model/model card will be posted here as relevant. The model seems fairly converged; if updates/improvements are possible using the `BookSum` dataset, this repo will be updated.
104
 
105
  ### Training hyperparameters
106