Combine and process audio files
Download and prepare voice conversion models
Generate text descriptions from images