Collections
Discover the best community collections!
Collections including paper arxiv:2503.20215
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 30 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 101
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 134 -
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 241 -
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Paper • 2502.04644 • Published • 2 -
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
Paper • 2504.07956 • Published • 43
-
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
Paper • 2503.16870 • Published • 5 -
Gemma 3 Technical Report
Paper • 2503.19786 • Published • 46 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 134 -
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
Paper • 2503.19855 • Published • 25
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 7 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 6 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 31 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 134
-
Survey on Evaluation of LLM-based Agents
Paper • 2503.16416 • Published • 86 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 134 -
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 241
-
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 22 -
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
Paper • 2503.16660 • Published • 71 -
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Paper • 2503.18931 • Published • 29 -
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
Paper • 2503.13964 • Published • 19