EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Paper • 2502.09560 • Published 1 day ago • 22
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper • 2407.21787 • Published Jul 31, 2024 • 13
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity Paper • 2501.16295 • Published 19 days ago • 8
CodeMonkeys: Scaling Test-Time Compute for Software Engineering Paper • 2501.14723 • Published 22 days ago • 8
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents Paper • 2502.05957 • Published 6 days ago • 13
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 5 days ago • 117
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published 5 days ago • 47
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published 5 days ago • 17
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published 4 days ago • 23
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published 10 days ago • 21
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper • 2502.07374 • Published 4 days ago • 27
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published 4 days ago • 28
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation Paper • 2502.05178 • Published 8 days ago • 10
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference Paper • 2502.04416 • Published 9 days ago • 10
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models Paper • 2502.04404 • Published 9 days ago • 18
Generating Symbolic World Models via Test-time Scaling of Large Language Models Paper • 2502.04728 • Published 8 days ago • 16