Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 17 days ago • 46
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 27 days ago • 40
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Paper • 2502.13922 • Published Feb 19 • 25
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach Paper • 2502.03639 • Published Feb 5 • 9
🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 105 items • Updated 24 days ago • 97
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Paper • 2501.03124 • Published Jan 6 • 14
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 44
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published Nov 22, 2024 • 16
Multimodal-SAE Collection The collection of the sae that hooked on llava • 5 items • Updated Mar 4 • 8
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 134
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published Nov 21, 2024 • 61
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published Nov 9, 2024 • 45