--- license: mit --- # ClassicVC ClassicVC is an any-to-any voice conversion model that enables users to design their original speaker styles by selecting the coordinates from the continuous latent spaces. The model components are implemented using PyTorch and fully compatible with ONNX. [MMCXLI](https://github.com/lyodos) provides the dedicated graphical user interface (GUI) for ClassicVC. It runs on wxPython and ONNX Runtime. Users can download the ONNX files and try out speech conversion without having to install PyTorch or train a model with their own voice data. ## Model Details ### Model Description - **Developed by:** Lyodos (Lyodos the City of the Museum) ### Model Sources [optional] - **Repository:** [GitHub](https://github.com/lyodos/classic-vc) ---- ## Uses Based on the MIT License, users can use the model codes and checkpoints for research purpose. It is provided with no guarantees. ### Direct Use [More Information Needed] ### Out-of-Scope Use This model was prototyped as a hobbyist's research into any-to-any voice conversion, and we make no guarantees especially regarding its reliability or real-time operation. As for use in situations involving an unspecified number of people, such as web broadcasting, and mission-critical applications, including medical, transportation, infrastructure, and weapon systems, we do not prohibit such use as the developer since the MIT License is the only stated license, but we do not encourage it. [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model The [Notebook 01 of the ClassicVC repository](https://github.com/lyodos/classic-vc) provides the procedure for offline (non real-time) voice conversion. [The MMCXLI repository](https://github.com/lyodos/mmcxli) provides GUI, which depends on local Python environment. ---- ## Training Details ### Training Data The model checkpoints provided here were trained on the following three datasets. 1. LibriSpeech ASR corpus * V. Panayotov, G. Chen, D. Povey and S. Khudanpur, "Librispeech: An ASR corpus based on public domain audio books," 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 2015, pp. 5206-5210, doi: 10.1109/ICASSP.2015.7178964. * https://ieeexplore.ieee.org/document/7178964 * https://openslr.org/12/ 2. Samrómur Children 21.09 * Mena, Carlos; et al., 2021, Samromur Children 21.09, CLARIN-IS, http://hdl.handle.net/20.500.12537/185. * https://repository.clarin.is/repository/xmlui/handle/20.500.12537/185 * https://openslr.org/117/ 3. VoxCeleb 1 and 2 * A. Nagrani*, J. S. Chung*, A. Zisserman, "VoxCeleb: a large-scale speaker identification dataset", Interspeech 2017 * J. S. Chung*, A. Nagrani*, A. Zisserman, "VoxCeleb2: Deep Speaker Recognition", Interspeech 2018 * A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman, "VoxCeleb: Large-scale speaker verification in the wild", Computer Speech and Language, 2019 * https://huggingface.co/datasets/ProgramComputer/voxceleb/tree/main/vox2 ### Training Procedure The [Notebook 02 of the ClassicVC repository](https://github.com/lyodos/classic-vc) provides the procedure for data preparation. The [Notebook 03 of the ClassicVC repository](https://github.com/lyodos/classic-vc) provides the training code.