MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design Paper • 2412.14590 • Published 6 days ago • 8
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 7 days ago • 103
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 12 days ago • 131
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper • 2412.08443 • Published 14 days ago • 38
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published 16 days ago • 68
CompCap: Improving Multimodal Large Language Models with Composite Captions Paper • 2412.05243 • Published 19 days ago • 18
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published 20 days ago • 55
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 21 days ago • 118
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. • 23 items • Updated 12 days ago • 119
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published 27 days ago • 13
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published 29 days ago • 76
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 29 days ago • 47
SmolVLM Collection State-of-the-art compact VLMs for on-device applications: Base, Synthetic, and Instruct • 5 items • Updated 3 days ago • 30