Spaces:

aoxiang1221
/

AI-Azuma

Runtime error

App Files Files Community

aoxiang1221 commited on Oct 5, 2023

Commit

c395ff4

1 Parent(s): 7613654

update

Browse files

Files changed (5) hide show

README.md +7 -436
bert_vits2/Model/Azuma/G_17400.pth +3 -0
bert_vits2/Model/Azuma/config.json +95 -0
bert_vits2/bert/pytorch_model.bin +3 -0
config.py +2 -0

README.md CHANGED Viewed

@@ -1,436 +1,7 @@
-<div class="title" align=center>
-    <h1>vits-simple-api</h1>
-	<div>Simply call the vits api</div>
-    <br/>
-    <br/>
-    <p>
-        <img src="https://img.shields.io/github/license/Artrajz/vits-simple-api">
-    	<img src="https://img.shields.io/badge/python-3.10-green">
-        <a href="https://hub.docker.com/r/artrajz/vits-simple-api">
-            <img src="https://img.shields.io/docker/pulls/artrajz/vits-simple-api"></a>
-    </p>
-    <a href="https://github.com/Artrajz/vits-simple-api/blob/main/README.md">English</a>|<a href="https://github.com/Artrajz/vits-simple-api/blob/main/README_zh.md">中文文档</a>
-    <br/>
-</div>
-# Feature
-- [x] VITS text-to-speech, voice conversion
-- [x] HuBert-soft VITS
-- [x] [vits_chinese](https://github.com/PlayVoice/vits_chinese)
-- [x] [Bert-VITS2](https://github.com/Stardust-minus/Bert-VITS2)
-- [x] W2V2 VITS / [emotional-vits](https://github.com/innnky/emotional-vits) dimensional emotion model
-- [x] Support for loading multiple models
-- [x] Automatic language recognition and processing,set the scope of language type recognition according to model's cleaner,support for custom language type range
-- [x] Customize default parameters
-- [x] Long text batch processing
-- [x] GPU accelerated inference
-- [x] SSML (Speech Synthesis Markup Language) work in progress...
-## demo
-[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Artrajz/vits-simple-api)
-Please note that different IDs may support different languages.[speakers](https://artrajz-vits-simple-api.hf.space/voice/speakers)
-- `https://artrajz-vits-simple-api.hf.space/voice/vits?text=你好,こんにちは&id=164`
-- `https://artrajz-vits-simple-api.hf.space/voice/vits?text=Difficult the first time, easy the second.&id=4`
-- excited:`https://artrajz-vits-simple-api.hf.space/voice/w2v2-vits?text=こんにちは&id=3&emotion=111`
-- whispered:`https://artrajz-vits-simple-api.hf.space/w2v2-vits?text=こんにちは&id=3&emotion=2077`
-https://user-images.githubusercontent.com/73542220/237995061-c1f25b4e-dd86-438a-9363-4bb1fe65b425.mov
-# Deploy
-## Docker(Recommended for Linux)
-### Docker image pull script
-```
-bash -c "$(wget -O- https://raw.githubusercontent.com/Artrajz/vits-simple-api/main/vits-simple-api-installer-latest.sh)"
-```
-- The platforms currently supported by Docker images are `linux/amd64` and `linux/arm64`.(arm64 only has a CPU version)
-- After a successful pull, the vits model needs to be imported before use. Please follow the steps below to import the model.
-### Download  VITS model
-Put the model into `/usr/local/vits-simple-api/Model`
-<details><summary>Folder structure</summary><pre><code>
-│  hubert-soft-0d54a1f4.pt
-│  model.onnx
-│  model.yaml
-│
-├─g
-│      config.json
-│      G_953000.pth
-│
-├─louise
-│      360_epochs.pth
-│      config.json
-│
-├─Nene_Nanami_Rong_Tang
-│      1374_epochs.pth
-│      config.json
-│
-├─Zero_no_tsukaima
-│       1158_epochs.pth
-│       config.json
-│
-└─npy
-       25ecb3f6-f968-11ed-b094-e0d4e84af078.npy
-       all_emotions.npy
-</code></pre></details>
-### Modify model path
-Modify in  `/usr/local/vits-simple-api/config.py`
-<details><summary>config.py</summary><pre><code>
-# Fill in the model path here
-MODEL_LIST = [
-    # VITS
-    [ABS_PATH + "/Model/Nene_Nanami_Rong_Tang/1374_epochs.pth", ABS_PATH + "/Model/Nene_Nanami_Rong_Tang/config.json"],
-    [ABS_PATH + "/Model/Zero_no_tsukaima/1158_epochs.pth", ABS_PATH + "/Model/Zero_no_tsukaima/config.json"],
-    [ABS_PATH + "/Model/g/G_953000.pth", ABS_PATH + "/Model/g/config.json"],
-    # HuBert-VITS (Need to configure HUBERT_SOFT_MODEL)
-    [ABS_PATH + "/Model/louise/360_epochs.pth", ABS_PATH + "/Model/louise/config.json"],
-    # W2V2-VITS (Need to configure DIMENSIONAL_EMOTION_NPY)
-    [ABS_PATH + "/Model/w2v2-vits/1026_epochs.pth", ABS_PATH + "/Model/w2v2-vits/config.json"],
-]
-# hubert-vits: hubert soft model
-HUBERT_SOFT_MODEL = ABS_PATH + "/Model/hubert-soft-0d54a1f4.pt"
-# w2v2-vits: Dimensional emotion npy file
-# load single npy: ABS_PATH+"/all_emotions.npy
-# load mutiple npy: [ABS_PATH + "/emotions1.npy", ABS_PATH + "/emotions2.npy"]
-# load mutiple npy from folder: ABS_PATH + "/Model/npy"
-DIMENSIONAL_EMOTION_NPY = ABS_PATH + "/Model/npy"
-# w2v2-vits: Need to have both `model.onnx` and `model.yaml` files in the same path.
-DIMENSIONAL_EMOTION_MODEL = ABS_PATH + "/Model/model.yaml"
-</code></pre></details>
-### Startup
-`docker compose up -d`
-Or execute the pull script again
-### Image update
-Run the docker image pull script again
-## Virtual environment deployment
-### Clone
-`git clone https://github.com/Artrajz/vits-simple-api.git`
-###  Download python dependencies
-A python virtual environment is recommended
-`pip install -r requirements.txt`
-Fasttext may not be installed on windows, you can install it with the following command,or download wheels [here](https://www.lfd.uci.edu/~gohlke/pythonlibs/#fasttext)
-```
-# python3.10 win_amd64
-pip install https://github.com/Artrajz/archived/raw/main/fasttext/fasttext-0.9.2-cp310-cp310-win_amd64.whl
-```
-### Download  VITS model
-Put the model into `/path/to/vits-simple-api/Model`
-<details><summary>Folder structure</summary><pre><code>
-│  hubert-soft-0d54a1f4.pt
-│  model.onnx
-│  model.yaml
-│
-├─g
-│      config.json
-│      G_953000.pth
-│
-├─louise
-│      360_epochs.pth
-│      config.json
-│
-├─Nene_Nanami_Rong_Tang
-│      1374_epochs.pth
-│      config.json
-│
-├─Zero_no_tsukaima
-│       1158_epochs.pth
-│       config.json
-│
-└─npy
-       25ecb3f6-f968-11ed-b094-e0d4e84af078.npy
-       all_emotions.npy
-</code></pre></details>
-### Modify model path
-Modify in  `/path/to/vits-simple-api/config.py`
-<details><summary>config.py</summary><pre><code>
-# Fill in the model path here
-MODEL_LIST = [
-    # VITS
-    [ABS_PATH + "/Model/Nene_Nanami_Rong_Tang/1374_epochs.pth", ABS_PATH + "/Model/Nene_Nanami_Rong_Tang/config.json"],
-    [ABS_PATH + "/Model/Zero_no_tsukaima/1158_epochs.pth", ABS_PATH + "/Model/Zero_no_tsukaima/config.json"],
-    [ABS_PATH + "/Model/g/G_953000.pth", ABS_PATH + "/Model/g/config.json"],
-    # HuBert-VITS (Need to configure HUBERT_SOFT_MODEL)
-    [ABS_PATH + "/Model/louise/360_epochs.pth", ABS_PATH + "/Model/louise/config.json"],
-    # W2V2-VITS (Need to configure DIMENSIONAL_EMOTION_NPY)
-    [ABS_PATH + "/Model/w2v2-vits/1026_epochs.pth", ABS_PATH + "/Model/w2v2-vits/config.json"],
-]
-# hubert-vits: hubert soft model
-HUBERT_SOFT_MODEL = ABS_PATH + "/Model/hubert-soft-0d54a1f4.pt"
-# w2v2-vits: Dimensional emotion npy file
-# load single npy: ABS_PATH+"/all_emotions.npy
-# load mutiple npy: [ABS_PATH + "/emotions1.npy", ABS_PATH + "/emotions2.npy"]
-# load mutiple npy from folder: ABS_PATH + "/Model/npy"
-DIMENSIONAL_EMOTION_NPY = ABS_PATH + "/Model/npy"
-# w2v2-vits: Need to have both `model.onnx` and `model.yaml` files in the same path.
-DIMENSIONAL_EMOTION_MODEL = ABS_PATH + "/Model/model.yaml"
-</code></pre></details>
-### Startup
-`python app.py`
-# GPU accelerated
-## Windows
-### Install CUDA
-Check the highest version of CUDA supported by your graphics card:
-```
-nvidia-smi
-```
-Taking CUDA 11.7 as an example, download it from the [official website](https://developer.nvidia.com/cuda-11-7-0-download-archive?target_os=Windows&amp;target_arch=x86_64&amp;target_version=10&amp;target_type=exe_local)
-### Install GPU version of PyTorch
-1.13.1+cu117 is recommended, other versions may have memory instability issues.
-```
-pip install torch==1.13.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
-```
-## Linux
-The installation process is similar, but I don't have the environment to test it.
-# Dependency Installation Issues
-Since pypi.org does not have the `pyopenjtalk` whl file, it usually needs to be installed from the source code. This process might be troublesome for some people. Therefore, you can also use the whl I built for installation.
-```
-pip install pyopenjtalk -i https://pypi.artrajz.cn/simple
-```
-# API
-## GET
-#### speakers list
-- GET http://127.0.0.1:23456/voice/speakers
-  Returns the mapping table of role IDs to speaker names.
-#### voice vits
-- GET http://127.0.0.1:23456/voice/vits?text=text
-  Default values are used when other parameters are not specified.
-- GET http://127.0.0.1:23456/voice/vits?text=[ZH]text[ZH][JA]text[JA]&lang=mix
-  When lang=mix, the text needs to be annotated.
-- GET http://127.0.0.1:23456/voice/vits?text=text&id=142&format=wav&lang=zh&length=1.4
-   The text is "text", the role ID is 142, the audio format is wav, the text language is zh, the speech length is 1.4, and the other parameters are default.
-#### check
-- GET http://127.0.0.1:23456/voice/check?id=0&model=vits
-## POST
-- See `api_test.py`
-## API KEY
-Set `API_KEY_ENABLED = True` in `config.py` to enable API key authentication. The API key is `API_KEY = "api-key"`.
-After enabling it, you need to add the `api_key` parameter in GET requests and add the `X-API-KEY` parameter in the header for POST requests.
-# Parameter
-## VITS
-| Name                   | Parameter | Is must | Default          | Type  | Instruction                                                  |
-| ---------------------- | --------- | ------- | ---------------- | ----- | ------------------------------------------------------------ |
-| Synthesized text       | text      | true    |                  | str   | Text needed for voice synthesis.                             |
-| Speaker ID             | id        | false   | From `config.py` | int   | The speaker ID.                                              |
-| Audio format           | format    | false   | From `config.py` | str   | Support for wav,ogg,silk,mp3,flac                            |
-| Text language          | lang      | false   | From `config.py` | str   | The language of the text to be synthesized. Available options include auto, zh, ja, and mix. When lang=mix, the text should be wrapped in [ZH] or [JA].The default mode is auto, which automatically detects the language of the text |
-| Audio length           | length    | false   | From `config.py` | float | Adjusts the length of the synthesized speech, which is equivalent to adjusting the speed of the speech. The larger the value, the slower the speed. |
-| Noise                  | noise     | false   | From `config.py` | float | Sample noise, controlling the randomness of the synthesis.   |
-| SDP noise              | noisew    | false   | From `config.py` | float | Stochastic Duration Predictor noise, controlling the length of phoneme pronunciation. |
-| Segmentation threshold | max       | false   | v                | int   | Divide the text into paragraphs based on punctuation marks, and combine them into one paragraph when the length exceeds max. If max<=0, the text will not be divided into paragraphs. |
-| Streaming response     | streaming | false   | false            | bool  | Streamed synthesized speech with faster initial response.    |
-## VITS voice conversion
-| Name           | Parameter   | Is must | Default | Type | Instruction                                               |
-| -------------- | ----------- | ------- | ------- | ---- | --------------------------------------------------------- |
-| Uploaded Audio | upload      | true    |         | file | The audio file to be uploaded. It should be in wav or ogg |
-| Source Role ID | original_id | true    |         | int  | The ID of the role used to upload the audio file.         |
-| Target Role ID | target_id   | true    |         | int  | The ID of the target role to convert the audio to.        |
-## HuBert-VITS
-| Name              | Parameter | Is must | Default | Type  | Instruction                                                  |
-| ----------------- | --------- | ------- | ------- | ----- | ------------------------------------------------------------ |
-| Uploaded Audio    | upload    | true    |         | file  | The audio file to be uploaded. It should be in wav or ogg format. |
-| Target speaker ID | id        | true    |         | int   | The target  speaker ID.                                      |
-| Audio format      | format    | true    |         | str   | wav,ogg,silk                                                 |
-| Audio length      | length    | true    |         | float | Adjusts the length of the synthesized speech, which is equivalent to adjusting the speed of the speech. The larger the value, the slower the speed. |
-| Noise             | noise     | true    |         | float | Sample noise, controlling the randomness of the synthesis.   |
-| sdp noise         | noisew    | true    |         | float | Stochastic Duration Predictor noise, controlling the length of phoneme pronunciation. |
-## W2V2-VITS
-| Name                   | Parameter | Is must | Default          | Type  | Instruction                                                  |
-| ---------------------- | --------- | ------- | ---------------- | ----- | ------------------------------------------------------------ |
-| Synthesized text       | text      | true    |                  | str   | Text needed for voice synthesis.                             |
-| Speaker ID             | id        | false   | From `config.py` | int   | The speaker ID.                                              |
-| Audio format           | format    | false   | From `config.py` | str   | Support for wav,ogg,silk,mp3,flac                            |
-| Text language          | lang      | false   | From `config.py` | str   | The language of the text to be synthesized. Available options include auto, zh, ja, and mix. When lang=mix, the text should be wrapped in [ZH] or [JA].The default mode is auto, which automatically detects the language of the text |
-| Audio length           | length    | false   | From `config.py` | float | Adjusts the length of the synthesized speech, which is equivalent to adjusting the speed of the speech. The larger the value, the slower the speed. |
-| Noise                  | noise     | false   | From `config.py` | float | Sample noise, controlling the randomness of the synthesis.   |
-| SDP noise              | noisew    | false   | From `config.py` | float | Stochastic Duration Predictor noise, controlling the length of phoneme pronunciation. |
-| Segmentation threshold | max       | false   | From `config.py` | int   | Divide the text into paragraphs based on punctuation marks, and combine them into one paragraph when the length exceeds max. If max<=0, the text will not be divided into paragraphs. |
-| Dimensional emotion    | emotion   | false   | 0                | int   | The range depends on the emotion reference file in npy format, such as the  range of the [innnky](https://huggingface.co/spaces/innnky/nene-emotion/tree/main)'s model all_emotions.npy, which is 0-5457. |
-## Dimensional emotion
-| Name           | Parameter | Is must | Default | Type | Instruction                                                  |
-| -------------- | --------- | ------- | ------- | ---- | ------------------------------------------------------------ |
-| Uploaded Audio | upload    | true    |         | file | Return the npy file that stores the dimensional emotion vectors. |
-## Bert-VITS2
-| Name                   | Parameter | Is must | Default          | Type  | Instruction                                                  |
-| ---------------------- | --------- | ------- | ---------------- | ----- | ------------------------------------------------------------ |
-| Synthesized text       | text      | true    |                  | str   | Text needed for voice synthesis.                             |
-| Speaker ID             | id        | false   | From `config.py` | int   | The speaker ID.                                              |
-| Audio format           | format    | false   | From `config.py` | str   | Support for wav,ogg,silk,mp3,flac                            |
-| Text language          | lang      | false   | From `config.py` | str   | "Auto" is a mode for automatic language detection and is also the default mode. However, it currently only supports detecting the language of an entire text passage and cannot distinguish languages on a per-sentence basis. The other available language options are "zh" and "ja". |
-| Audio length           | length    | false   | From `config.py` | float | Adjusts the length of the synthesized speech, which is equivalent to adjusting the speed of the speech. The larger the value, the slower the speed. |
-| Noise                  | noise     | false   | From `config.py` | float | Sample noise, controlling the randomness of the synthesis.   |
-| SDP noise              | noisew    | false   | From `config.py` | float | Stochastic Duration Predictor noise, controlling the length of phoneme pronunciation. |
-| Segmentation threshold | max       | false   | From `config.py` | int   | Divide the text into paragraphs based on punctuation marks, and combine them into one paragraph when the length exceeds max. If max<=0, the text will not be divided into paragraphs. |
-| SDP/DP mix ratio       | sdp_ratio | false   | From `config.py` | int   | The theoretical proportion of SDP during synthesis, the higher the ratio, the larger the variance in synthesized voice tone. |
-## SSML (Speech Synthesis Markup Language)
-Supported Elements and Attributes
-`speak` Element
-| Attribute | Instruction                                                  | Is must |
-| --------- | ------------------------------------------------------------ | ------- |
-| id        | Default value is retrieved from `config.py`                  | false   |
-| lang      | Default value is retrieved from `config.py`                  | false   |
-| length    | Default value is retrieved from `config.py`                  | false   |
-| noise     | Default value is retrieved from `config.py`                  | false   |
-| noisew    | Default value is retrieved from `config.py`                  | false   |
-| max       | Splits text into segments based on punctuation marks. When the sum of segment lengths exceeds `max`, it is treated as one segment. `max<=0` means no segmentation. The default value is 0. | false   |
-| model     | Default is `vits`. Options: `w2v2-vits`, `emotion-vits`      | false   |
-| emotion   | Only effective when using `w2v2-vits` or `emotion-vits`. The range depends on the npy emotion reference file. | false   |
-`voice` Element
-Higher priority than `speak`.
-| Attribute | Instruction                                                  | Is must |
-| --------- | ------------------------------------------------------------ | ------- |
-| id        | Default value is retrieved from `config.py`                  | false   |
-| lang      | Default value is retrieved from `config.py`                  | false   |
-| length    | Default value is retrieved from `config.py`                  | false   |
-| noise     | Default value is retrieved from `config.py`                  | false   |
-| noisew    | Default value is retrieved from `config.py`                  | false   |
-| max       | Splits text into segments based on punctuation marks. When the sum of segment lengths exceeds `max`, it is treated as one segment. `max<=0` means no segmentation. The default value is 0. | false   |
-| model     | Default is `vits`. Options: `w2v2-vits`, `emotion-vits`      | false   |
-| emotion   | Only effective when using `w2v2-vits` or `emotion-vits`      | false   |
-`break` Element
-| Attribute | Instruction                                                  | Is must |
-| --------- | ------------------------------------------------------------ | ------- |
-| strength  | x-weak, weak, medium (default), strong, x-strong             | false   |
-| time      | The absolute duration of a pause in seconds (such as `2s`) or milliseconds (such as `500ms`). Valid values range from 0 to 5000 milliseconds. If you set a value greater than the supported maximum, the service will use `5000ms`. If the `time` attribute is set, the `strength` attribute is ignored. | false   |
-| Strength | Relative Duration |
-| :------- | :---------------- |
-| x-weak   | 250 ms            |
-| weak     | 500 ms            |
-| medium   | 750 ms            |
-| strong   | 1000 ms           |
-| x-strong | 1250 ms           |
-Example
-```xml
-<speak lang="zh" format="mp3" length="1.2">
-    <voice id="92" >这几天心里颇不宁静。</voice>
-    <voice id="125">今晚在院子里坐着乘凉，忽然想起日日走过的荷塘，在这满月的光里，总该另有一番样子吧。</voice>
-    <voice id="142">月亮渐渐地升高了，墙外马路上孩子们的欢笑，已经听不见了；</voice>
-    <voice id="98">妻在屋里拍着闰儿，迷迷糊糊地哼着眠歌。</voice>
-    <voice id="120">我悄悄地披了大衫，带上门出去。</voice><break time="2s"/>
-    <voice id="121">沿着荷塘，是一条曲折的小煤屑路。</voice>
-    <voice id="122">这是一条幽僻的路；白天也少人走，夜晚更加寂寞。</voice>
-    <voice id="123">荷塘四面，长着许多树，蓊蓊郁郁的。</voice>
-    <voice id="124">路的一旁，是些杨柳，和一些不知道名字的树。</voice>
-    <voice id="125">没有月光的晚上，这路上阴森森的，有些怕人。</voice>
-    <voice id="126">今晚却很好，虽然月光也还是淡淡的。</voice><break time="2s"/>
-    <voice id="127">路上只我一个人，背着手踱着。</voice>
-    <voice id="128">这一片天地好像是我的；我也像超出了平常的自己，到了另一个世界里。</voice>
-    <voice id="129">我爱热闹，也爱冷静；<break strength="x-weak"/>爱群居，也爱独处。</voice>
-    <voice id="130">像今晚上，一个人在这苍茫的月下，什么都可以想，什么都可以不想，便觉是个自由的人。</voice>
-    <voice id="131">白天里一定要做的事，一定要说的话，现在都可不理。</voice>
-    <voice id="132">这是独处的妙处，我且受用这无边的荷香月色好了。</voice>
-</speak>
-```
-# Communication
-Learning and communication,now there is only Chinese [QQ group](https://qm.qq.com/cgi-bin/qm/qr?k=-1GknIe4uXrkmbDKBGKa1aAUteq40qs_&jump_from=webapi&authKey=x5YYt6Dggs1ZqWxvZqvj3fV8VUnxRyXm5S5Kzntc78+Nv3iXOIawplGip9LWuNR/)
-# Acknowledgements
-- vits:https://github.com/jaywalnut310/vits
-- MoeGoe:https://github.com/CjangCjengh/MoeGoe
-- emotional-vits:https://github.com/innnky/emotional-vits
-- vits-uma-genshin-honkai:https://huggingface.co/spaces/zomehwh/vits-uma-genshin-honkai
-- vits_chinese:https://github.com/PlayVoice/vits_chinese
-- Bert_VITS2:https://github.com/fishaudio/Bert-VITS2

+license: mit
+title: vits-simple-api
+sdk: gradio
+pinned: true
+python_version: 3.10.11
+emoji: 👀
+app_file: app.py

bert_vits2/Model/Azuma/G_17400.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:184324a03109748e68f7e6587433ef2889e1c57aab84ebbe7825bf3bf0fbfc63
+size 629537628

bert_vits2/Model/Azuma/config.json ADDED Viewed

	@@ -0,0 +1,95 @@

+{
+  "train": {
+    "log_interval": 10,
+    "eval_interval": 100,
+    "seed": 52,
+    "epochs": 10000,
+    "learning_rate": 0.0003,
+    "betas": [
+      0.8,
+      0.99
+    ],
+    "eps": 1e-09,
+    "batch_size": 18,
+    "fp16_run": false,
+    "lr_decay": 0.999875,
+    "segment_size": 16384,
+    "init_lr_ratio": 1,
+    "warmup_epochs": 0,
+    "c_mel": 45,
+    "c_kl": 1.0
+  },
+  "data": {
+    "use_mel_posterior_encoder": false,
+    "training_files": "filelists/train.list",
+    "validation_files": "filelists/val.list",
+    "max_wav_value": 32768.0,
+    "sampling_rate": 44100,
+    "filter_length": 2048,
+    "hop_length": 512,
+    "win_length": 2048,
+    "n_mel_channels": 128,
+    "mel_fmin": 0.0,
+    "mel_fmax": null,
+    "add_blank": true,
+    "n_speakers": 1,
+    "cleaned_text": true,
+    "spk2id": {
+      "Azuma": 0
+    }
+  },
+  "model": {
+    "use_spk_conditioned_encoder": true,
+    "use_noise_scaled_mas": true,
+    "use_mel_posterior_encoder": false,
+    "use_duration_discriminator": true,
+    "inter_channels": 192,
+    "hidden_channels": 192,
+    "filter_channels": 768,
+    "n_heads": 2,
+    "n_layers": 6,
+    "kernel_size": 3,
+    "p_dropout": 0.1,
+    "resblock": "1",
+    "resblock_kernel_sizes": [
+      3,
+      7,
+      11
+    ],
+    "resblock_dilation_sizes": [
+      [
+        1,
+        3,
+        5
+      ],
+      [
+        1,
+        3,
+        5
+      ],
+      [
+        1,
+        3,
+        5
+      ]
+    ],
+    "upsample_rates": [
+      8,
+      8,
+      2,
+      2,
+      2
+    ],
+    "upsample_initial_channel": 512,
+    "upsample_kernel_sizes": [
+      16,
+      16,
+      8,
+      2,
+      2
+    ],
+    "n_layers_q": 3,
+    "use_spectral_norm": false,
+    "gin_channels": 256
+  }
+}

bert_vits2/bert/pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4ac62d49144d770c5ca9a5d1d3039c4995665a080febe63198189857c6bd11cd
+size 1306484351

config.py CHANGED Viewed

@@ -67,6 +67,8 @@ MODEL_LIST = [
     # [ABS_PATH + "/Model/w2v2-vits/1026_epochs.pth", ABS_PATH + "/Model/w2v2-vits/config.json"],
     # Bert-VITS2
     # [ABS_PATH + "/Model/bert_vits2/G_9000.pth", ABS_PATH + "/Model/bert_vits2/config.json"],
 ]
 # hubert-vits: hubert soft model

     # [ABS_PATH + "/Model/w2v2-vits/1026_epochs.pth", ABS_PATH + "/Model/w2v2-vits/config.json"],
     # Bert-VITS2
     # [ABS_PATH + "/Model/bert_vits2/G_9000.pth", ABS_PATH + "/Model/bert_vits2/config.json"],
+    [ABS_PATH + "/bert_vits2/Model/Azuma/G_17400.pth", ABS_PATH + "/bert_vits2/Model/Azuma/config.json"]
 ]
 # hubert-vits: hubert soft model