SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 7 days ago • 158
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality Mar 4 • 73
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 142
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? Paper • 2502.13233 • Published Feb 18 • 14
Baichuan-M1: Pushing the Medical Capability of Large Language Models Paper • 2502.12671 • Published Feb 18 • 1
Scaling Test-Time Compute Without Verification or RL is Suboptimal Paper • 2502.12118 • Published Feb 17 • 1
Is Noise Conditioning Necessary for Denoising Generative Models? Paper • 2502.13129 • Published Feb 18 • 1
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario Paper • 2501.10132 • Published Jan 17 • 20
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 91
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published Jan 29 • 58
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 223