CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper • 2503.12329 • Published Mar 16 • 24
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published Feb 11 • 48
UI Agent Collection a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics • 357 items • Updated 44 minutes ago • 52
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published Feb 5 • 24
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 89
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published Dec 23, 2024 • 44
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models Paper • 2411.17451 • Published Nov 26, 2024 • 11
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25, 2024 • 63
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Paper • 2404.12387 • Published Apr 18, 2024 • 40
A Primer on the Inner Workings of Transformer-based Language Models Paper • 2405.00208 • Published Apr 30, 2024 • 10
Calibrating Reasoning in Language Models with Internal Consistency Paper • 2405.18711 • Published May 29, 2024 • 6
🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 107 items • Updated 9 days ago • 99
Silkie: Preference Distillation for Large Visual Language Models Paper • 2312.10665 • Published Dec 17, 2023 • 11