---
license: afl-3.0
---

# Generating Declarative Statements from QA Pairs

There are already some rule-based models that can accomplish this task, but I haven't seen any transformer-based models that can do so. Therefore, I trained this model based on `Bart-base` to transform QA pairs into declarative statements.

I compared the my model with other rule base models, including 

> [paper1](https://aclanthology.org/D19-5401.pdf) (2019), which proposes **2 Encoder Pointer-Gen model**

and

> [paper2](https://arxiv.org/pdf/2112.03849.pdf) (2021), which propose **RBV2 model**

**Here are results compared to 2 Encoder Pointer-Gen model (on testset released by paper1)**

Test on testset

| Model   | 2 Encoder Pointer-Gen(2019) | BART-base  |
| ------- | --------------------------- | ---------- |
| BLEU    | 74.05                       | **78.878** |
| ROUGE-1 | 91.24                       | **91.937** |
| ROUGE-2 | 81.91                       | **82.177** |
| ROUGE-L | 86.25                       | **87.172** |

Test on NewsQA testset

| Model   | 2 Encoder Pointer-Gen | BART       |
| ------- | --------------------- | ---------- |
| BLEU    | 73.29                 | **74.966** |
| ROUGE-1 | **95.38**             | 89.328     |
| ROUGE-2 | **87.18**             | 78.538     |
| ROUGE-L | **93.65**             | 87.583     |

Test on free_base testset

| Model   | 2 Encoder Pointer-Gen | BART       |
| ------- | --------------------- | ---------- |
| BLEU    | 75.41                 | **76.082** |
| ROUGE-1 | **93.46**             | 92.693     |
| ROUGE-2 | **82.29**             | 81.216     |
| ROUGE-L | **87.5**              | 86.834     |


**As paper2 doesn't release its own dataset, it's hard to make a fair comparison. But according to results in paper2, the Bleu and ROUGE score of their model is lower than that of MPG, which is exactly the 2 Encoder Pointer-Gen model.**

| Model        | BLEU | ROUGE-1 | ROUGE-2 | ROUGE-L |
| ------------ | ---- | ------- | ------- | ------- |
| RBV2         | 74.8 | 95.3    | 83.1    | 90.3    |
| RBV2+BERT    | 71.5 | 93.9    | 82.4    | 89.5    |
| RBV2+RoBERTa | 72.1 | 94      | 83.1    | 89.8    |
| RBV2+XLNET   | 71.2 | 93.6    | 82.3    | 89.4    |
| MPG          | 75.8 | 94.4    | 87.4    | 91.6    |

There are reasons to believe that my model performs better than RBV2.

To sum up,my model performs nearly as well as the SOTA rule-based model evaluated with BLEU and ROUGE score. However the sentence pattern is lack of diversity.

(It's worth mentioning that even though I tried my best to conduct objective tests, the testsets I could find were more or less different from what they introduced in the paper.)