DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought Paper • 2412.17498 • Published 2 days ago • 13
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published 6 days ago • 70
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Paper • 2412.17153 • Published 3 days ago • 26
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Paper • 2412.13018 • Published 8 days ago • 40
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation Paper • 2412.10704 • Published 11 days ago • 14
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published 15 days ago • 35
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 13 days ago • 74
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 12 days ago • 131
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper • 2412.04432 • Published 20 days ago • 14
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 16 days ago • 62
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay Paper • 2412.04449 • Published 20 days ago • 6
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published 19 days ago • 121
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration Paper • 2412.04440 • Published 20 days ago • 19
Mind the Time: Temporally-Controlled Multi-Event Video Generation Paper • 2412.05263 • Published 19 days ago • 10
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases Paper • 2412.04862 • Published 19 days ago • 48