Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding Paper • 2501.17578 • Published 5 days ago • 1
iFormer: Integrating ConvNet and Transformer for Mobile Application Paper • 2501.15369 • Published 9 days ago • 10
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine Paper • 2408.02900 • Published Aug 6, 2024 • 28
The Geometry of Tokens in Internal Representations of Large Language Models Paper • 2501.10573 • Published 17 days ago • 8
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 25 days ago • 87
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding Paper • 2501.07783 • Published 21 days ago • 7
Cosmos Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated 17 days ago • 37
Generalized Gaussian Model for Learned Image Compression Paper • 2411.19320 • Published Nov 28, 2024 • 1
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token Paper • 2412.06676 • Published Dec 9, 2024 • 9
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model Paper • 2411.17459 • Published Nov 26, 2024 • 11
Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations? Paper • 2406.10743 • Published Jun 15, 2024 • 1
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings Paper • 2411.08017 • Published Nov 12, 2024 • 11