Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection Paper • 2503.12271 • Published 7 days ago • 8
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation Paper • 2502.03860 • Published Feb 6 • 24
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published Jan 31 • 38
Demystifying Domain-adaptive Post-training for Financial LLMs Paper • 2501.04961 • Published Jan 9 • 11
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows Paper • 2412.01169 • Published Dec 2, 2024 • 13
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 64
HYPO: Hyperspherical Out-of-Distribution Generalization Paper • 2402.07785 • Published Feb 12, 2024
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Paper • 2403.20331 • Published Mar 29, 2024 • 16
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey Paper • 2407.21794 • Published Jul 31, 2024 • 6
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models Paper • 2406.14852 • Published Jun 21, 2024
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published Sep 25, 2024 • 25
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows" Paper • 2410.03727 • Published Sep 30, 2024 • 2
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7, 2024 • 13
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation Paper • 2303.04991 • Published Mar 9, 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning Paper • 2311.18799 • Published Nov 30, 2023 • 1