GraphicBench: A Planning Benchmark for Graphic Design with Language Agents Paper • 2504.11571 • Published 5 days ago
Exploring Expert Failures Improves LLM Agent Tuning Paper • 2504.13145 • Published 3 days ago • 11 • 4
Exploring Expert Failures Improves LLM Agent Tuning Paper • 2504.13145 • Published 3 days ago • 11 • 4
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 46
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Paper • 2410.21259 • Published Oct 28, 2024 • 1
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published 11 days ago • 45
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published 11 days ago • 45
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published 11 days ago • 45 • 4
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published 11 days ago • 45 • 4
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published 11 days ago • 45 • 4
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning Paper • 2504.05520 • Published 13 days ago • 9
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published 6 days ago • 37
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning Paper • 2504.05520 • Published 13 days ago • 9
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published 6 days ago • 37
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Paper • 2504.10766 • Published 6 days ago • 37 • 2
Towards Visual Text Grounding of Multimodal Large Language Model Paper • 2504.04974 • Published 14 days ago • 15
Towards Visual Text Grounding of Multimodal Large Language Model Paper • 2504.04974 • Published 14 days ago • 15
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing Paper • 2504.07964 • Published 10 days ago • 59