LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model Paper • 2312.17240 • Published Dec 28, 2023 • 1
RTime-QA: A Benchmark for Atomic Temporal Event Understanding in Large Multi-modal Models Paper • 2505.19125 • Published May 25, 2025
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published Dec 18, 2025 • 20
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 5 days ago • 95
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 5 days ago • 95
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published Dec 18, 2025 • 20
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Paper • 2512.16864 • Published Dec 18, 2025 • 11
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Paper • 2512.16864 • Published Dec 18, 2025 • 11