Question Aware Vision Transformer for Multimodal Reasoning Paper • 2402.05472 • Published Feb 8, 2024 • 9 • 2
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 182 • 25