[Fine-tuning Code] Here is an implementation πŸ‘‹ !

#3
by Gaie - opened

😊 Hi everyone! We are very pleased to announce that align-anything now supports fine-tuning for Qwen2.5-Omni. The code is here πŸ‘‰ https://github.com/PKU-Alignment/align-anything/pull/169.

Image

Compared to the community's implementation, we believe our solution is more user-friendly. You just need to run the following script after installation to start training without modifying anything!

  • Installation:
# We tested on the H800 computing cluster, and this version of CUDA works well.
# You can adjust this version according to the computing cluster's actual situation.

conda install nvidia/label/cuda-12.2.0::cuda
export CUDA_HOME=$CONDA_PREFIX
cd align-anything
pip install -e .[train]

# for qwen2.5-omni
pip uninstall transformers
pip install git+https://github.com/huggingface/transformers@3a1ead0aabed473eafe527915eea8c197d424356
pip install -U flash-attn --no-build-isolation
  • train
cd scripts
bash qwen_omni_sft.sh
Gaie changed discussion title from [Fintuning Code] Here is an implementation πŸ‘‹ ! to [Fine-tuning Code] Here is an implementation πŸ‘‹ !

any way to fine tune it to a new voice/language?

Sure! We will support text+audio -> text modality fine-tuning soon. Stay tuned!

only fintune the thinker module?

I'm looking for video+audio -> text fine tuning. Basically to caption a shirt video

Sure! We will support text+audio -> text modality fine-tuning soon. Stay tuned!

It would be great to have support for "text+audio -> audio (only)" for conversations or call tasks. Thanks

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment