D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation Paper • 2504.09454 • Published 8 days ago • 11
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 6 days ago • 228
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 19 days ago • 80
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 11 days ago • 70
FreSca: Unveiling the Scaling Space in Diffusion Models Paper • 2504.02154 • Published 18 days ago • 18
ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published 17 days ago • 76
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published 19 days ago • 34
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation Paper • 2503.19693 • Published 26 days ago • 75
ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Paper • 2503.21144 • Published 25 days ago • 25
view article Article Training and Finetuning Reranker Models with Sentence Transformers v4 26 days ago • 112
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published 27 days ago • 117
Modifying Large Language Model Post-Training for Diverse Creative Writing Paper • 2503.17126 • Published about 1 month ago • 36
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Paper • 2503.16257 • Published Mar 20 • 24