|
--- |
|
language: en |
|
tags: |
|
- summarization |
|
license: mit |
|
model-index: |
|
- name: SamuelAllen123/t5-efficient-large-nl36_fine_tune_sum_V2 |
|
results: |
|
- task: |
|
type: summarization |
|
name: Summarization |
|
dataset: |
|
name: samsum |
|
type: samsum |
|
config: samsum |
|
split: test |
|
metrics: |
|
- name: ROUGE-1 |
|
type: rouge |
|
value: 50.5049 |
|
verified: true |
|
- name: ROUGE-2 |
|
type: rouge |
|
value: 25.6469 |
|
verified: true |
|
- name: ROUGE-L |
|
type: rouge |
|
value: 41.7544 |
|
verified: true |
|
- name: ROUGE-LSUM |
|
type: rouge |
|
value: 46.2055 |
|
verified: true |
|
- name: loss |
|
type: loss |
|
value: 1.5158178806304932 |
|
verified: true |
|
- name: gen_len |
|
type: gen_len |
|
value: 24.0342 |
|
verified: true |
|
- task: |
|
type: summarization |
|
name: Summarization |
|
dataset: |
|
name: cnn_dailymail |
|
type: cnn_dailymail |
|
config: 3.0.0 |
|
split: test |
|
metrics: |
|
- name: ROUGE-1 |
|
type: rouge |
|
value: 34.4055 |
|
verified: true |
|
- name: ROUGE-2 |
|
type: rouge |
|
value: 14.127 |
|
verified: true |
|
- name: ROUGE-L |
|
type: rouge |
|
value: 24.3353 |
|
verified: true |
|
- name: ROUGE-LSUM |
|
type: rouge |
|
value: 31.6582 |
|
verified: true |
|
- name: loss |
|
type: loss |
|
value: 2.4456119537353516 |
|
verified: true |
|
- name: gen_len |
|
type: gen_len |
|
value: 45.928 |
|
verified: true |
|
- task: |
|
type: summarization |
|
name: Summarization |
|
dataset: |
|
name: samsum |
|
type: samsum |
|
config: samsum |
|
split: train |
|
metrics: |
|
- name: ROUGE-1 |
|
type: rouge |
|
value: 54.933 |
|
verified: true |
|
- name: ROUGE-2 |
|
type: rouge |
|
value: 31.7965 |
|
verified: true |
|
- name: ROUGE-L |
|
type: rouge |
|
value: 47.0057 |
|
verified: true |
|
- name: ROUGE-LSUM |
|
type: rouge |
|
value: 51.2027 |
|
verified: true |
|
- name: loss |
|
type: loss |
|
value: 1.130684494972229 |
|
verified: true |
|
- name: gen_len |
|
type: gen_len |
|
value: 23.7989 |
|
verified: true |
|
--- |
|
Summarize without adding summarize to the start of the string. |
|
|
|
Trained on Samsum train split. |
|
|
|
Parameters for training: |
|
|
|
no_decay = ["bias", "LayerNorm.weight", "layer_norm.weight"] |
|
optimizer_grouped_parameters = [ |
|
{ |
|
"params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)], |
|
"weight_decay": 0.0, |
|
}, |
|
{ |
|
"params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], |
|
"weight_decay": 0.0, |
|
}, |
|
] |
|
|
|
lr = 0.00005 |
|
optimizer = torch.optim.RAdam(optimizer_grouped_parameters, lr=lr) |
|
|
|
lr_scheduler = get_scheduler( |
|
name="linear", |
|
optimizer=optimizer, |
|
num_warmup_steps=0, |
|
num_training_steps=50005) |
|
|
|
This was only for 10K steps with a batch size of 10 |
|
|
|
If you want more info, feel free to message me or email me at: |
|
[email protected] |