Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 8 days ago • 58
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published 14 days ago • 33
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published Feb 11 • 34
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 62
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21 • 54
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 276
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4, 2024 • 95
Orca 2: Teaching Small Language Models How to Reason Paper • 2311.11045 • Published Nov 18, 2023 • 75
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models Paper • 2311.10093 • Published Nov 16, 2023 • 58
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities Paper • 2311.05698 • Published Nov 9, 2023 • 14
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models Paper • 2311.05997 • Published Nov 10, 2023 • 37
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Paper • 2309.16058 • Published Sep 27, 2023 • 55