M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought Paper • 2405.16473 • Published May 26
Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement Paper • 2406.17233 • Published Jun 25
A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification Paper • 2304.09820 • Published Apr 18, 2023
Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding Paper • 2112.11953 • Published Dec 22, 2021
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Paper • 2412.05939 • Published 17 days ago • 12
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Paper • 2412.05939 • Published 17 days ago • 12
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Paper • 2412.05939 • Published 17 days ago • 12 • 2
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published 20 days ago • 48
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 21 days ago • 118
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 74
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 53
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17 • 33
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10 • 40