view article Article Ο0 and Ο0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 β’ 139
Phantom: Subject-consistent video generation via cross-modal alignment Paper β’ 2502.11079 β’ Published Feb 16 β’ 58
Running 543 543 Vision Arena (Testing VLMs side-by-side) πΌ Analyze images to detect and label objects
Cosmos Tokenizer Collection A suite of image and video tokenizers β’ 13 items β’ Updated about 23 hours ago β’ 40
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use Paper β’ 2410.24218 β’ Published Oct 31, 2024 β’ 6