Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 13 days ago • 75
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Paper • 2408.02718 • Published Aug 5 • 60
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 183