DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention Paper • 2309.14327 • Published Sep 25, 2023 • 22
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Paper • 2407.08083 • Published Jul 10, 2024 • 30
Teaching Transformers Causal Reasoning through Axiomatic Training Paper • 2407.07612 • Published Jul 10, 2024 • 2
PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM Paper • 2503.07111 • Published 8 days ago • 2