2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 11 days ago • 92
LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting Paper • 2412.00177 • Published Nov 29, 2024 • 7 • 3
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 27 days ago • 41
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 23 days ago • 38
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published Nov 16, 2024 • 44
internlm/internlm-xcomposer2-vl-7b Visual Question Answering • Updated Apr 12, 2024 • 2.22k • 80
sentence-transformers/clip-ViT-B-32-multilingual-v1 Sentence Similarity • Updated Nov 5, 2024 • 250k • 148