metadata

license: afl-3.0

Generating Declarative Statements from QA Pairs

There are already some rule-based models that can accomplish this task, but I haven't seen any transformer-based models that can do so. Therefore, I trained this model based on Bart-base to transform QA pairs into declarative statements.

I compared the my model with other rule base models, including

paper1 (2019), which proposes 2 Encoder Pointer-Gen model

and

paper2 (2021), which proposes RBV2 model

Here are results compared to 2 Encoder Pointer-Gen model (on testset released by paper1)

Test on testset

Model	2 Encoder Pointer-Gen(2019)	BART-base
BLEU	74.05	78.878
ROUGE-1	91.24	91.937
ROUGE-2	81.91	82.177
ROUGE-L	86.25	87.172

Test on NewsQA testset

Model	2 Encoder Pointer-Gen	BART
BLEU	73.29	74.966
ROUGE-1	95.38	89.328
ROUGE-2	87.18	78.538
ROUGE-L	93.65	87.583

Test on free_base testset

Model	2 Encoder Pointer-Gen	BART
BLEU	75.41	76.082
ROUGE-1	93.46	92.693
ROUGE-2	82.29	81.216
ROUGE-L	87.5	86.834

As paper2 doesn't release its own dataset, it's hard to make a fair comparison. But according to results in paper2, the Bleu and ROUGE score of their model is lower than that of MPG, which is exactly the 2 Encoder Pointer-Gen model.

Model	BLEU	ROUGE-1	ROUGE-2	ROUGE-L
RBV2	74.8	95.3	83.1	90.3
RBV2+BERT	71.5	93.9	82.4	89.5
RBV2+RoBERTa	72.1	94	83.1	89.8
RBV2+XLNET	71.2	93.6	82.3	89.4
MPG	75.8	94.4	87.4	91.6

There are reasons to believe that my model performs better than RBV2.

To sum up,my model performs nearly as well as the SOTA rule-based model evaluated with BLEU and ROUGE score. However the sentence pattern is lack of diversity.

(It's worth mentioning that even though I tried my best to conduct objective tests, the testsets I could find were more or less different from what they introduced in the paper.)