Spaces:
Configuration error
Configuration error
DataSet
A benchmark for multi-dimensional question generation evaluation, which consists of 200 instances from SQuAD and HotpotQA, each instance contains 15 questions generated by 15 different QG models.
Evalutaion dimensions:
- fluency
- clarity
- conciseness
- relevance
- consistency
- answerability
- answer consistency
Models
Trained QG models used for generating questions to be evaluated.