view post Post 517 Gemma 3 seems to be really good at human preference. Just waiting for ppl to see it. See translation 🔥 2 2 + Reply
A Multimodal Symphony: Integrating Taste and Sound through Generative AI Paper • 2503.02823 • Published 9 days ago • 2
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion Paper • 2503.01183 • Published 10 days ago • 26
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Paper • 2502.20583 • Published 13 days ago • 11
view post Post 2711 Wan2.1 🔥📹 new OPEN video model by Alibaba Wan team!Model: Wan-AI/Wan2.1-T2V-14BDemo: Wan-AI/Wan2.1✨Apache 2.0✨8.19GB VRAM, runs on most GPUs✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A✨Text Generation: Supports Chinese & English✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision See translation 1 reply · 🔥 11 11 🚀 5 5 👍 4 4 🤯 2 2 + Reply
view post Post 5069 She arrived 😍[Expect more models soon...] See translation 2 replies · 👍 25 25 🚀 1 1 + Reply
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published 23 days ago • 37
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published 27 days ago • 51