Lyodos commited on
Commit
dbccd16
1 Parent(s): 0c2898e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -3
README.md CHANGED
@@ -1,3 +1,106 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # ClassicVC
6
+
7
+ ClassicVC is an any-to-any voice conversion model that enables users to design their original speaker styles
8
+ by selecting the coordinates from the continuous latent spaces.
9
+ The model components are implemented using PyTorch and fully compatible with ONNX.
10
+
11
+ [MMCXLI](https://github.com/lyodos) provides the dedicated graphical user interface (GUI) for ClassicVC.
12
+ It runs on wxPython and ONNX Runtime.
13
+ Users can download the ONNX files and try out speech conversion
14
+ without having to install PyTorch or train a model with their own voice data.
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+ - **Developed by:** Lyodos (Lyodos the City of the Museum)
24
+
25
+ ### Model Sources [optional]
26
+
27
+ <!-- Provide the basic links for the model. -->
28
+
29
+ - **Repository:** [GitHub](https://github.com/lyodos/classic-vc)
30
+
31
+ ----
32
+
33
+ ## Uses
34
+
35
+ Based on the MIT License, users can use the model codes and checkpoints for research purpose.
36
+ It is provided with no guarantees.
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Out-of-Scope Use
47
+
48
+ This model was prototyped as a hobbyist's research into any-to-any voice conversion,
49
+ and we make no guarantees especially regarding its reliability or real-time operation.
50
+
51
+ As for use in situations involving an unspecified number of people, such as web broadcasting,
52
+ and mission-critical applications, including medical, transportation, infrastructure, and weapon systems,
53
+ we do not prohibit such use as the developer since the MIT License is the only stated license, but we do not encourage it.
54
+
55
+ [More Information Needed]
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ### Recommendations
64
+
65
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
66
+
67
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ The [Notebook 01 of the ClassicVC repository](https://github.com/lyodos/classic-vc) provides the procedure for offline (non real-time) voice conversion.
72
+
73
+ [The MMCXLI repository](https://github.com/lyodos/mmcxli) provides GUI, which depends on local Python environment.
74
+
75
+ ----
76
+
77
+ ## Training Details
78
+
79
+ ### Training Data
80
+
81
+ The model checkpoints provided here were trained on the following three datasets.
82
+
83
+ 1. LibriSpeech ASR corpus
84
+ * V. Panayotov, G. Chen, D. Povey and S. Khudanpur, "Librispeech: An ASR corpus based on public domain audio books," 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 2015, pp. 5206-5210, doi: 10.1109/ICASSP.2015.7178964.
85
+ * https://ieeexplore.ieee.org/document/7178964
86
+ * https://openslr.org/12/
87
+
88
+ 2. Samrómur Children 21.09
89
+
90
+ * Mena, Carlos; et al., 2021, Samromur Children 21.09, CLARIN-IS, http://hdl.handle.net/20.500.12537/185.
91
+ * https://repository.clarin.is/repository/xmlui/handle/20.500.12537/185
92
+ * https://openslr.org/117/
93
+
94
+ 3. VoxCeleb 1 and 2
95
+
96
+ * A. Nagrani*, J. S. Chung*, A. Zisserman, "VoxCeleb: a large-scale speaker identification dataset", Interspeech 2017
97
+ * J. S. Chung*, A. Nagrani*, A. Zisserman, "VoxCeleb2: Deep Speaker Recognition", Interspeech 2018
98
+ * A. Nagrani*, J. S. Chung*, W. Xie, A. Zisserman, "VoxCeleb: Large-scale speaker verification in the wild", Computer Speech and Language, 2019
99
+ * https://huggingface.co/datasets/ProgramComputer/voxceleb/tree/main/vox2
100
+
101
+
102
+ ### Training Procedure
103
+
104
+ The [Notebook 02 of the ClassicVC repository](https://github.com/lyodos/classic-vc) provides the procedure for data preparation.
105
+
106
+ The [Notebook 03 of the ClassicVC repository](https://github.com/lyodos/classic-vc) provides the training code.