Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Paper • 2312.09911 • Published Dec 15, 2023 • 53
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper • 2311.05437 • Published Nov 9, 2023 • 48
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model Paper • 2311.06214 • Published Nov 10, 2023 • 30
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Paper • 2311.05556 • Published Nov 9, 2023 • 82
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Paper • 2309.16058 • Published Sep 27, 2023 • 55
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis Paper • 2310.00426 • Published Sep 30, 2023 • 61
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages Paper • 2309.09400 • Published Sep 17, 2023 • 84
CodePlan: Repository-level Coding using LLMs and Planning Paper • 2309.12499 • Published Sep 21, 2023 • 74
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning Paper • 2309.15091 • Published Sep 26, 2023 • 32
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition Paper • 2309.15223 • Published Sep 26, 2023 • 19
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack Paper • 2309.15807 • Published Sep 27, 2023 • 32
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans Paper • 2308.08545 • Published Aug 16, 2023 • 33