Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published 2 days ago • 27
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought Paper • 2412.17498 • Published 2 days ago • 13
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios Paper • 2412.08972 • Published 14 days ago • 9
VisionArena: 230K Real World User-VLM Conversations with Preference Labels Paper • 2412.08687 • Published 14 days ago • 13
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published 13 days ago • 25
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 12 days ago • 131
Finance Commons Collection A large collection of multimodal financial documents in open data. • 7 items • Updated Jul 17 • 7
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais • Nov 13 • 98
YiZhao Dataset Collection Data and filtering models of our financial open-source YiZhao Dataset. • 5 items • Updated 14 days ago • 1
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published 29 days ago • 21
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published 29 days ago • 76
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Paper • 2411.18203 • Published 28 days ago • 31
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation Paper • 2412.02592 • Published 22 days ago • 20
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper • 2412.02259 • Published 22 days ago • 59
Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement Paper • 2412.04003 • Published 20 days ago • 9
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published 21 days ago • 43