Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant Paper • 2410.13360 • Published Oct 17 • 8
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Paper • 2411.18203 • Published 28 days ago • 31
Towards Interpreting Visual Information Processing in Vision-Language Models Paper • 2410.07149 • Published Oct 9 • 1
Understanding Alignment in Multimodal LLMs: A Comprehensive Study Paper • 2407.02477 • Published Jul 2 • 21
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy Paper • 2411.15453 • Published Nov 23
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published Nov 22 • 15
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token Paper • 2412.06676 • Published 16 days ago • 9
From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding Paper • 2412.06474 • Published 16 days ago
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published 12 days ago • 10
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper • 2412.09604 • Published 12 days ago • 35
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer Paper • 2412.13871 • Published 7 days ago • 17
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published 7 days ago • 13