I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token Paper • 2412.06676 • Published 16 days ago • 9
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Paper • 2409.03420 • Published Sep 5 • 26
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs Paper • 2404.16375 • Published Apr 25 • 16
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation Paper • 2311.07562 • Published Nov 13, 2023 • 13