rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 5 days ago • 197
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published 19 days ago • 89
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 20 days ago • 66
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published 11 days ago • 46
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 17 days ago • 78
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 12 days ago • 92
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published about 1 month ago • 137
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 26 days ago • 121
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 88
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper • 2412.07760 • Published Dec 10, 2024 • 50
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models Paper • 2412.04146 • Published Dec 5, 2024 • 22
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 105
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Paper • 2412.02687 • Published Dec 3, 2024 • 108
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 122
One Shot, One Talk: Whole-body Talking Avatar from a Single Image Paper • 2412.01106 • Published Dec 2, 2024 • 18