Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 155
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 7 days ago • 235
SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking Paper • 2503.00955 • Published Mar 2 • 27
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens Paper • 2502.18890 • Published Feb 26 • 28
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub Feb 12 • 62
view article Article From Files to Chunks: Improving Hugging Face Storage Efficiency Nov 20, 2024 • 58
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated Feb 20 • 253
The Differences Between Direct Alignment Algorithms are a Blur Paper • 2502.01237 • Published Feb 3 • 115
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published Feb 3 • 213
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 120
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 286
AI Paper of the Day Collection A collection of papers that I think are interesting, one added each day • 336 items • Updated 1 day ago • 41
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 101
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147