the output has slight strange reverb due to the process of using vocal remover + de-reverb tools, maybe also due to using just 3 minutes of data for training.

however once you add reverb in remix and some instumentals is almost un-noticable.

it was trained in singing data only, using with the talking reference audios inputs will result in strange sounding speech outputs.