MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published 16 days ago • 78
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published 16 days ago • 60
Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens Paper • 2503.08377 • Published Mar 11 • 2
PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control Paper • 2412.01223 • Published Dec 2, 2024 • 1