Model description

This is the T5-3B model for the "explain" component of System 4's "Classify then explain" pipeline, as described in our paper Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE, FigLang workshop @ EMNLP 2022 (Arxiv link: https://arxiv.org/abs/2210.16407)

System 4: Two-step System - Classify then explain

In contrast to Systems 1 to 3 where the entailment/contradiction label and associated explanation are predicted jointly, System 4 uses a two-step “classify then explain” pipeline. This current model is for the "explain" component of the pipeline. The input-output format is:

Input <Premise> <Hypothesis> <Label> 
Output <Explanation>

How to use this model?

We provide a quick example of how you can try out the "explain" component of System 4 in our paper with just a few lines of code:

>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
>>> model = AutoModelForSeq2SeqLM.from_pretrained("allenai/System4_explain_FigLang2022")

>>> tokenizer = AutoTokenizer.from_pretrained("t5-3b")
>>> input_string = "Premise: It is wrong to lie to children. Hypothesis: Telling lies to the young is like clippin the wings of a butterfly. Is there a contradiction or entailment between the premise and hypothesis? Answer : Entailment. Explanation : "
>>> input_ids = tokenizer.encode(input_string, return_tensors="pt")
>>> output = model.generate(input_ids, max_length=200)
>>> tokenizer.batch_decode(output, skip_special_tokens=True)
['Clipping the wings of a butterfly means that the butterfly will never be able to fly, so lying to children is like doing the same.']

More details about DREAM-FLUTE ...

For more details about DREAM-FLUTE, please refer to our:

This model is part of our DREAM-series of works. This is a line of research where we make use of scene elaboration for building a "mental model" of situation given in text. Check out our GitHub Repo for more!

More details about this model ...

Training and evaluation data

We use the FLUTE dataset for the FigLang2022SharedTask (https://huggingface.co/datasets/ColumbiaNLP/FLUTE) for training this model. ∼7500 samples are provided as the training set. We used a 80-20 split to create our own training (6027 samples) and validation (1507 samples) partitions on which we build our models. For details on how we make use of the training data provided in the FigLang2022 shared task, please refer to https://github.com/allenai/dream/blob/main/FigLang2022SharedTask/Process_Data_Train_Dev_split.ipynb.

Model details

This model is a fine-tuned version of t5-3b.

It achieves the following results on the evaluation set:

  • Loss: 1.0331
  • Rouge1: 53.8485
  • Rouge2: 32.8855
  • Rougel: 46.6534
  • Rougelsum: 46.6435
  • Gen Len: 29.7724

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 2
  • total_eval_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.3633 0.33 1000 1.2468 44.8469 24.3002 37.9797 37.9943 18.8341
1.2531 0.66 2000 1.1445 45.7234 25.6755 39.5817 39.5653 18.8786
1.2148 1.0 3000 1.0806 47.4244 27.6605 41.0803 41.0628 18.7339
0.7554 1.33 4000 1.1006 47.5505 28.2781 41.385 41.3774 18.6556
0.7761 1.66 5000 1.0671 48.583 29.6223 42.5451 42.5247 18.6821
0.7777 1.99 6000 1.0331 48.8329 30.5086 43.0964 43.0586 18.6881
0.4378 2.32 7000 1.1978 48.6239 30.2101 42.8863 42.8851 18.7259
0.4715 2.66 8000 1.1545 49.1311 31.0582 43.523 43.5043 18.7598
0.462 2.99 9000 1.1471 49.4022 31.7946 44.0345 44.0128 18.7200

Framework versions

  • Transformers 4.22.0.dev0
  • Pytorch 1.12.1+cu113
  • Datasets 2.4.0
  • Tokenizers 0.12.1
Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using allenai/System4_explain_FigLang2022 2