license: mit
language:
- en
library_name: peft
Master Thesis: High-Fidelity Video Background Music Generation using Transformers
This is the corresponding GitLab Repository of my Master Thesis. The goal this thisis is to generate video background music by the adaptation of MusicGen (https://arxiv.org/pdf/2306.05284.pdf) to video input as another input modality. This should be accomplished by mapping video information into the T5 text embedding space on which MusicGen usually works on. To this end, a Transformer Encoder network to accomplish this task, called Video Encoder. Two options are foreseen within the training loop for the Video Encoder:
- freezing the weights within the MusicGen Audio Decoder
- adjusting the weights of the MusicGen Audio Decoder with Parameter Efficient Fine-Tuning (PEFT) using LoRA (https://arxiv.org/abs/2106.09685)
Installation
- create a Python virtual environment with
Python 3.11
- check https://pytorch.org/get-started/previous-versions/ to install
PyTorch 2.1.0
withCUDA
on your machine - install the local fork of audiocraft:
cd audiocraft; pip install -e .
- install the other requirements:
pip install -r requirements.txt
Folder Structure
audiocraft
contains a local fork of the audiocraft library (https://github.com/facebookresearch/audiocraft) with little changes to the generation method, further information can be seen incode/code_adaptations_audiocraft
.code
contains the code for modeltraining
andinference
of video background musicdatasets
contains the code to create the datasets used for training withindata_preparation
and video examples used for the evaluation inexample_videos
evaluation
contains the code used to evaluate the datasets and created video embeddingsgradio_app
contains the code for interface to generate video background music
Training
To train the models set the training parameters under training/training_conf.yml
and start training with
python training/training.py
. The models weights will be stored under training/models_audiocraft
or
training/models_peft
respectively.
Inference
- start the user interface by running
python gradio_app/app.py
- inside the interface select a video, parameters
- click on "submit" to start the generation
Contact
For any questions contact me at [email protected]