nesquik

Sleeping

App Files Files Community

nesquik / README.md

hugo flores garcia

quickstart

ee4b45b about 1 month ago

preview code

raw

history blame

4.36 kB

	---
	title: salad bowl (vampnet)
	emoji: 🥗
	colorFrom: yellow
	colorTo: green
	sdk: gradio
	sdk_version: 4.37.2
	python_version: 3.9.17
	app_file: app.py
	pinned: false
	license: cc-by-nc-4.0
	---

	# VampNet

	This repository contains recipes for training generative music models on top of the Descript Audio Codec.

	# Setting up

	Requires Python 3.9.

	you'll need a Python 3.9 environment to run VampNet. This is due to a [known issue with madmom](https://github.com/hugofloresgarcia/vampnet/issues/15).

	(for example, using conda)
	```bash
	conda create -n vampnet python=3.9
	conda activate vampnet
	```

	install VampNet

	```bash
	git clone https://github.com/hugofloresgarcia/vampnet.git
	pip install -e ./vampnet
	```

	# Usage

	quick start!
	```python
	import random
	import vampnet
	import audiotools as at

	# load the default vampnet model
	interface = vampnet.interface.Interface.default()

	# list available finetuned models
	finetuned_model_choices = interface.available_models()
	print(f"available finetuned models: {finetuned_model_choices}")

	# pick a random finetuned model
	model_choice = random.choice(finetuned_model_choices)
	print(f"choosing model: {model_choice}")

	# load a finetuned model
	interface.load_finetuned(model_choice)

	# load an example audio file
	signal = at.AudioSignal("assets/example.wav")

	# get the tokens for the audio
	codes = interface.encode(signal)

	# build a mask for the audio
	mask = interface.build_mask(
	codes, signal,
	periodic_prompt=7,
	upper_codebook_mask=3,
	)

	# generate the output tokens
	output_tokens = interface.vamp(
	codes, mask, return_mask=False,
	temperature=1.0,
	typical_filtering=True,
	)

	# convert them to a signal
	output_signal = interface.decode(output_tokens)

	# save the output signal
	output_signal.write("scratch/output.wav")
	```


	## Launching the Gradio Interface
	You can launch a gradio UI to play with vampnet.

	```bash
	python app.py --args.load conf/interface.yml --Interface.device cuda
	```

	# Training / Fine-tuning

	## Training a model

	To train a model, run the following script:

	```bash
	python scripts/exp/train.py --args.load conf/vampnet.yml --save_path /path/to/checkpoints
	```

	for multi-gpu training, use torchrun:

	```bash
	torchrun --nproc_per_node gpu scripts/exp/train.py --args.load conf/vampnet.yml --save_path path/to/ckpt
	```

	You can edit `conf/vampnet.yml` to change the dataset paths or any training hyperparameters.

	For coarse2fine models, you can use `conf/c2f.yml` as a starting configuration.

	See `python scripts/exp/train.py -h` for a list of options.

	## Debugging training

	To debug training, it's easier to debug with 1 gpu and 0 workers

	```bash
	CUDA_VISIBLE_DEVICES=0 python -m pdb scripts/exp/train.py --args.load conf/vampnet.yml --save_path /path/to/checkpoints --num_workers 0
	```

	## Fine-tuning
	To fine-tune a model, use the script in `scripts/exp/fine_tune.py` to generate 3 configuration files: `c2f.yml`, `coarse.yml`, and `interface.yml`.
	The first two are used to fine-tune the coarse and fine models, respectively. The last one is used to launch the gradio interface.

	```bash
	python scripts/exp/fine_tune.py "/path/to/audio1.mp3 /path/to/audio2/ /path/to/audio3.wav" <fine_tune_name>
	```

	This will create a folder under `conf/<fine_tune_name>/` with the 3 configuration files.

	The save_paths will be set to `runs/<fine_tune_name>/coarse` and `runs/<fine_tune_name>/c2f`.

	launch the coarse job:
	```bash
	python scripts/exp/train.py --args.load conf/generated/<fine_tune_name>/coarse.yml
	```

	this will save the coarse model to `runs/<fine_tune_name>/coarse/ckpt/best/`.

	launch the c2f job:
	```bash
	python scripts/exp/train.py --args.load conf/generated/<fine_tune_name>/c2f.yml
	```

	## A note on argbind
	This repository relies on [argbind](https://github.com/pseeth/argbind) to manage CLIs and config files.
	Config files are stored in the `conf/` folder.

	### Licensing for Pretrained Models:
	The weights for the models are licensed [`CC BY-NC-SA 4.0`](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.ml). Likewise, any VampNet models fine-tuned on the pretrained models are also licensed [`CC BY-NC-SA 4.0`](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.ml).

	Download the pretrained models from [this link](https://zenodo.org/record/8136629). Then, extract the models to the `models/` folder.