Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published 6 days ago • 25
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP Paper • 2308.02487 • Published Aug 4, 2023 • 12
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models Paper • 2406.09416 • Published Jun 13 • 27
LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression Paper • 2406.20092 • Published Jun 28
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models Paper • 2406.09416 • Published Jun 13 • 27
General Object Foundation Model for Images and Videos at Scale Paper • 2312.09158 • Published Dec 14, 2023 • 8