Submitted by roadjiang 119 Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model · 54 authors 10
Submitted by YuuTennYi 47 GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation · 5 authors 2
Submitted by BestWishYsh 38 MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft · 7 authors 3
Submitted by tianchez 30 VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model · 12 authors 2
Submitted by ZhuangXialie 25 SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning · 6 authors 2
Submitted by yeates 17 ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration · 10 authors 2
Submitted by BestWishYsh 10 FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation · 4 authors 2
Submitted by DannyLan 10 Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models · 4 authors 6
Submitted by akhaliq 9 Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images · 7 authors 2
Submitted by stefan-it 9 ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance · 3 authors 3
Submitted by sauradip 8 In-2-4D: Inbetweening from Two Single-View Images to 4D Generation · 4 authors 2
Submitted by AdinaY 8 Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs · 52 authors 3
Submitted by jialuliluka 6 Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization · 6 authors 2
Submitted by nielsr 6 UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation · 3 authors 2
Submitted by richard-guyunqi 6 BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing · 5 authors 2
Submitted by gabrielelozupone98 5 Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging · 6 authors 2
Submitted by ruipeterpan 5 SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning · 6 authors 2
Submitted by aashiqmuhamed 4 SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs · 4 authors 2
Submitted by saidwivedi 4 InteractVLM: 3D Interaction Reasoning from 2D Foundational Models · 7 authors 2