OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction Paper • 2503.03734 • Published 8 days ago • 1 • 2
In-Context Imitation Learning via Next-Token Prediction Paper • 2408.15980 • Published Aug 28, 2024 • 10 • 3