this is a BERT model trained for QA. We use the pretrained model named bert-large-uncased-whole-word-masking-finetuned-squad from https://huggingface.co/transformers/pretrained_models.html and further fine tune it on the train split of DocVQA. This one is the best performing BERT model based on our experiments, reported in our paper https://arxiv.org/abs/2007.00398 i.e. the model listed last in Table 3 in the paper, which yields an ANLS score of 0.655 on val and 0.665 on test For the predictions on val and test splits using this model, see docvqa_eval_results folder