VLM - a poonyZ Collection

poonyZ 's Collections

T2I

agi

fancy

VLM

llm

VLM

updated 2 days ago

Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant

Paper • 2410.13360 • Published Oct 17 • 8
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Paper • 2411.18203 • Published 28 days ago • 31
Towards Interpreting Visual Information Processing in Vision-Language Models

Paper • 2410.07149 • Published Oct 9 • 1
Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2 • 21
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy

Paper • 2411.15453 • Published Nov 23
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22 • 15
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

Paper • 2412.06676 • Published 16 days ago • 9
From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding

Paper • 2412.06474 • Published 16 days ago
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Paper • 2412.09585 • Published 12 days ago • 10
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published 12 days ago • 35
Analyzing The Language of Visual Tokens

Paper • 2411.05001 • Published Nov 7 • 22
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Paper • 2412.13871 • Published 7 days ago • 17
FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published 7 days ago • 13