Generate virtual camera views from input images
Conversational speech generation
Tuning-free subject-driven generation
Send text and get detailed responses
Generate videos from text or images
OmniParser, turn your LLM into GUI agent
Generate high-quality audio from text using various controls
A unified multimodal understanding and generation model.