license: afl-3.0
Generating Declarative Statements from QA Pairs
There are already some rule-based models that can accomplish this task, but I haven't seen any transformer-based models that can do so. Therefore, I trained this model based on Bart-base
to transform QA pairs into declarative statements.
I compared the my model with other rule base models, including
paper1 (2019), which proposes 2 Encoder Pointer-Gen model
and
paper2 (2021), which proposes RBV2 model
Here are results compared to 2 Encoder Pointer-Gen model (on testset released by paper1)
Test on testset
Model | 2 Encoder Pointer-Gen(2019) | BART-base |
---|---|---|
BLEU | 74.05 | 78.878 |
ROUGE-1 | 91.24 | 91.937 |
ROUGE-2 | 81.91 | 82.177 |
ROUGE-L | 86.25 | 87.172 |
Test on NewsQA testset
Model | 2 Encoder Pointer-Gen | BART |
---|---|---|
BLEU | 73.29 | 74.966 |
ROUGE-1 | 95.38 | 89.328 |
ROUGE-2 | 87.18 | 78.538 |
ROUGE-L | 93.65 | 87.583 |
Test on free_base testset
Model | 2 Encoder Pointer-Gen | BART |
---|---|---|
BLEU | 75.41 | 76.082 |
ROUGE-1 | 93.46 | 92.693 |
ROUGE-2 | 82.29 | 81.216 |
ROUGE-L | 87.5 | 86.834 |
As paper2 doesn't release its own dataset, it's hard to make a fair comparison. But according to results in paper2, the Bleu and ROUGE score of their model is lower than that of MPG, which is exactly the 2 Encoder Pointer-Gen model.
Model | BLEU | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|---|
RBV2 | 74.8 | 95.3 | 83.1 | 90.3 |
RBV2+BERT | 71.5 | 93.9 | 82.4 | 89.5 |
RBV2+RoBERTa | 72.1 | 94 | 83.1 | 89.8 |
RBV2+XLNET | 71.2 | 93.6 | 82.3 | 89.4 |
MPG | 75.8 | 94.4 | 87.4 | 91.6 |
There are reasons to believe that my model performs better than RBV2.
To sum up,my model performs nearly as well as the SOTA rule-based model evaluated with BLEU and ROUGE score. However the sentence pattern is lack of diversity.
(It's worth mentioning that even though I tried my best to conduct objective tests, the testsets I could find were more or less different from what they introduced in the paper.)