Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding Paper • 2412.17295 • Published 3 days ago • 7
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Paper • 2412.17153 • Published 3 days ago • 26
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought Paper • 2412.17498 • Published 2 days ago • 13
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 2 days ago • 20
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published 8 days ago • 13
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 7 days ago • 103
Whisper-GPT: A Hybrid Representation Audio Large Language Model Paper • 2412.11449 • Published 10 days ago • 4
TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning Paper • 2412.10447 • Published 14 days ago • 5
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6 • 51
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs Paper • 2412.08347 • Published 14 days ago • 4
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published 13 days ago • 19
Gradio WebRTC Cookbook ⚡️ Collection Collection of real-time voice and video demos built with gradio-webrtc custom component • 8 items • Updated 15 days ago • 9
Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement Paper • 2412.04003 • Published 20 days ago • 9
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Paper • 2412.02687 • Published 22 days ago • 109
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published 19 days ago • 121