Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Paper โข 2501.07888 โข Published Jan 14 โข 15
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis Paper โข 2405.14224 โข Published May 23, 2024 โข 16