new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Oct 15

Submitted by

BiaoGong

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

·
9 authors

Submitted by

beccabai

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

·
15 authors

Submitted by

richardxp888

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

·
12 authors

Submitted by

dongguanting

Toward General Instruction-Following Alignment for Retrieval-Augmented Generation

·
6 authors

Submitted by

wenhu

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

·
16 authors

Submitted by

LituRout

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

·
6 authors

Submitted by

KbsdJames

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

·
20 authors

Submitted by

wlin21at

LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content

·
11 authors

Submitted by

ir1d

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

·
8 authors

Submitted by

Cuiunbo

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

·
11 authors

Submitted by

akhaliq

Thinking LLMs: General Instruction Following with Thought Generation

·
6 authors

Submitted by

Tigerph

Rethinking Data Selection at Scale: Random Selection is Almost All You Need

·
8 authors

Submitted by

mucai

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

·
15 authors

Submitted by

xiaowu0162

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

·
6 authors

Submitted by

zengziyun

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

·
8 authors

Submitted by

ArmelRandy

Tree of Problems: Improving structured problem solving with compositionality

·
3 authors

Submitted by

Guangxuan-Xiao

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

·
8 authors

Submitted by

yjze

Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

·
8 authors

Submitted by

ruochenz

The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling

·
5 authors

Submitted by

mdorkenw

TVBench: Redesigning Video-Language Evaluation

·
5 authors

Submitted by

nandan523

ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models

·
2 authors

Submitted by

seonghyeonye

Latent Action Pretraining from Videos

·
16 authors