AudioMosaic Collection ICML2026 AudioMosaic: Contrastive Masked Audio Representation Learning • 15 items • Updated 29 days ago • 3
MOSS-Audio Collection An open-source audio understanding model supporting speech recognition, environmental sound analysis, music understanding, time-aware QA, and complex • 7 items • Updated May 2 • 66
gliner2 family Collection GLiNER2 extends the original GLiNER architecture to support multi-task information extraction with a schema-driven interface. • 7 items • Updated 24 days ago • 49
CubePart: An Open-Vocabulary Part-Controllable 3D Generator Paper • 2605.28763 • Published 13 days ago • 14
GEM: Generative Supervision Helps Embodied Intelligence Paper • 2605.28548 • Published 13 days ago • 41
InstructSAM: Segment Any Instance with Any Instructions Paper • 2605.26102 • Published 15 days ago • 17
ControlLight: Towards Controllable, Consistent, and Generalizable Low-Light Enhancement Paper • 2605.25569 • Published 15 days ago • 21
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published 20 days ago • 109
TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction Paper • 2605.26115 • Published 15 days ago • 51
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU Paper • 2604.05091 • Published Apr 6 • 47
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments Paper • 2604.26067 • Published Apr 28 • 74
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published Apr 15 • 124
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 189