Submitted by beccabai 53 LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models · 15 authors 4
Submitted by richardxp888 48 MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models · 12 authors 4
Submitted by BiaoGong 45 Animate-X: Universal Character Image Animation with Enhanced Motion Representation · 9 authors 3
Submitted by dongguanting 43 Toward General Instruction-Following Alignment for Retrieval-Augmented Generation · 6 authors 3
Submitted by wenhu 34 MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks · 16 authors 3
Submitted by LituRout 26 Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations · 6 authors 3
Submitted by KbsdJames 26 Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models · 20 authors 3
Submitted by wlin21at 25 LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content · 11 authors 2
Submitted by ir1d 23 Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention · 8 authors 4
Submitted by Cuiunbo 21 VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents · 11 authors 2
Submitted by mucai 14 TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models · 15 authors 2
Submitted by Tigerph 14 Rethinking Data Selection at Scale: Random Selection is Almost All You Need · 8 authors 3
Submitted by xiaowu0162 9 LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory · 6 authors 2
Submitted by ArmelRandy 8 Tree of Problems: Improving structured problem solving with compositionality · 3 authors 2
Submitted by akhaliq 7 Thinking LLMs: General Instruction Following with Thought Generation · 6 authors 3
Submitted by zengziyun 7 MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models · 8 authors 2
Submitted by yjze 6 Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies · 8 authors 2
Submitted by Guangxuan-Xiao 5 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads · 8 authors 2
Submitted by ruochenz 5 The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling · 5 authors 2
Submitted by nandan523 3 ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models · 2 authors 2