Predict and visualize human poses in videos
Generate images from sketches and poses
Transcribe audio files or YouTube videos into text
Analyze images to generate detailed prompts