--- license: afl-3.0 --- # Generating Declarative Statements from QA Pairs There are already some rule-based models that can accomplish this task, but I haven't seen any transformer-based models that can do so. Therefore, I trained this model based on `Bart-base` to transform QA pairs into declarative statements. I compared the my model with other rule base models, including > [paper1](https://aclanthology.org/D19-5401.pdf) (2019), which proposes **2 Encoder Pointer-Gen model** and > [paper2](https://arxiv.org/pdf/2112.03849.pdf) (2021), which propose **RBV2 model** **Here are results compared to 2 Encoder Pointer-Gen model (on testset released by paper1)** Test on testset | Model | 2 Encoder Pointer-Gen(2019) | BART-base | | ------- | --------------------------- | ---------- | | BLEU | 74.05 | **78.878** | | ROUGE-1 | 91.24 | **91.937** | | ROUGE-2 | 81.91 | **82.177** | | ROUGE-L | 86.25 | **87.172** | Test on NewsQA testset | Model | 2 Encoder Pointer-Gen | BART | | ------- | --------------------- | ---------- | | BLEU | 73.29 | **74.966** | | ROUGE-1 | **95.38** | 89.328 | | ROUGE-2 | **87.18** | 78.538 | | ROUGE-L | **93.65** | 87.583 | Test on free_base testset | Model | 2 Encoder Pointer-Gen | BART | | ------- | --------------------- | ---------- | | BLEU | 75.41 | **76.082** | | ROUGE-1 | **93.46** | 92.693 | | ROUGE-2 | **82.29** | 81.216 | | ROUGE-L | **87.5** | 86.834 | **As paper2 doesn't release its own dataset, it's hard to make a fair comparison. But according to results in paper2, the Bleu and ROUGE score of their model is lower than that of MPG, which is exactly the 2 Encoder Pointer-Gen model.** | Model | BLEU | ROUGE-1 | ROUGE-2 | ROUGE-L | | ------------ | ---- | ------- | ------- | ------- | | RBV2 | 74.8 | 95.3 | 83.1 | 90.3 | | RBV2+BERT | 71.5 | 93.9 | 82.4 | 89.5 | | RBV2+RoBERTa | 72.1 | 94 | 83.1 | 89.8 | | RBV2+XLNET | 71.2 | 93.6 | 82.3 | 89.4 | | MPG | 75.8 | 94.4 | 87.4 | 91.6 | There are reasons to believe that my model performs better than RBV2. To sum up,my model performs nearly as well as the SOTA rule-based model evaluated with BLEU and ROUGE score. However the sentence pattern is lack of diversity. (It's worth mentioning that even though I tried my best to conduct objective tests, the testsets I could find were more or less different from what they introduced in the paper.)