view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 • 963
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models Paper • 2502.14802 • Published Feb 20 • 13
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data Paper • 2502.14044 • Published Feb 19 • 8
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Paper • 2502.14377 • Published Feb 20 • 12
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published Feb 20 • 13
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization Paper • 2502.14638 • Published Feb 20 • 11
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning Paper • 2502.12853 • Published Feb 18 • 29
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models Paper • 2502.14834 • Published Feb 20 • 24
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 141
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Paper • 2502.04320 • Published Feb 6 • 37
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published Jan 18 • 26