Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 11 days ago • 130
Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats Paper • 2410.12781 • Published Oct 16 • 5
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation Paper • 2410.02458 • Published Oct 3 • 9
MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis Paper • 2410.02103 • Published Oct 2 • 8
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion Paper • 2409.17145 • Published Sep 25 • 13
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt Paper • 2409.12892 • Published Sep 19 • 5
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 74
Single-Layer Learnable Activation for Implicit Neural Representation (SL^{2}A-INR) Paper • 2409.10836 • Published Sep 17 • 4
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction Paper • 2409.11211 • Published Sep 17 • 8
Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos Paper • 2409.08353 • Published Sep 12 • 10
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Paper • 2409.02889 • Published Sep 4 • 55
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published Sep 2 • 27
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28 • 84
FLoD: Integrating Flexible Level of Detail into 3D Gaussian Splatting for Customizable Rendering Paper • 2408.12894 • Published Aug 23 • 4
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 58
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views Paper • 2408.10195 • Published Aug 19 • 12