Spaces:

line-corporation
/

promptttspp

Running

App Files Files Community

promptttspp / data_prep /README.md

MasayaKawamura

Initial commit

82334b0 4 months ago

preview code

raw

history blame contribute delete

1.92 kB

	# data_prep

	This directory contains the following data preparation scripts:

	1. MFA data preparation: Code for extracting phone alignments by Montréal Forced Aligner (MFA)
	2. Style prompt data preparation: Code for preparing synthetic annotations of style prompts.

	## 0. Download LibriTTS_R

	Before running any scripts, be sure to put the [LibriTTS-R](https://www.openslr.org/141/) dataset to `./LibriTTS_R`. You must have the following directory structure:

	```
	LibriTTS_R/
	├── BOOKS.txt
	├── CHAPTERS.txt
	├── LICENSE.txt
	├── NOTE.txt
	├── README_librispeech.txt
	├── README_libritts.txt
	├── README_libritts_r.txt
	├── SPEAKERS.txt
	├── dev-clean
	├── dev-other
	├── reader_book.tsv
	├── speakers.tsv
	├── test-clean
	├── test-other
	├── train-clean-100
	├── train-clean-360
	└── train-other-500
	```

	## 1. MFA data preparation

	### Setup for MFA

	```
	conda install -c conda-forge montreal-forced-aligner
	```

	```
	mfa model download dictionary english_us_arpa
	mfa model download acoustic english_us_arpa
	```

	### Usage

	Please check `runall_mfa.sh` for the usage.

	Note that running MFA for all the utterances in LibriTTS-R takes a long time (likely a few days).


	### Directory structure

	After all the data preparation steps, the following directories will be created:

	- `libritts_r_per_spk_cleaned`
	- `${spk}`
	- `textgrid`: text grid files
	- `wav24k`: 24kHz wav files

	```
	├── 100
	│ ├── textgrid
	│ └── wav24k
	├── 1001
	│ ├── textgrid
	│ └── wav24k
	├── 1006
	│ ├── textgrid
	│ └── wav24k
	...
	```


	## 2. Style prompt data preparation

	Code for estimating per-utterance style tags (e.g., low pitch, normal pitch and high pitch) from the data statistics.

	### Usage

	Please check `runall_style_prompt_tags.sh` for the usage.