Search-o1: Agentic Search-Enhanced Large Reasoning Models Paper • 2501.05366 • Published 3 days ago • 47
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 4 days ago • 65
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published 4 days ago • 68
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use Paper • 2501.02506 • Published 7 days ago • 9
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models Paper • 2501.01830 • Published 9 days ago • 15
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper • 2501.02976 • Published 6 days ago • 46
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides Paper • 2501.03936 • Published 5 days ago • 16
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 5 days ago • 54
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models Paper • 2501.02955 • Published 6 days ago • 39
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published 5 days ago • 42
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 9 days ago • 33
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published 10 days ago • 46
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 11 days ago • 91
PERSE: Personalized 3D Generative Avatars from A Single Portrait Paper • 2412.21206 • Published 13 days ago • 15
OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System Paper • 2412.20005 • Published 15 days ago • 17
On the Compositional Generalization of Multimodal LLMs for Medical Imaging Paper • 2412.20070 • Published 15 days ago • 43
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published 13 days ago • 23
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models Paper • 2412.18609 • Published 19 days ago • 15
WavePulse: Real-time Content Analytics of Radio Livestreams Paper • 2412.17998 • Published 20 days ago • 10
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System? Paper • 2412.18495 • Published 19 days ago • 8