[Fine-tuning Code] Here is an implementation π !
π Hi everyone! We are very pleased to announce that align-anything now supports fine-tuning for Qwen2.5-Omni. The code is here π https://github.com/PKU-Alignment/align-anything/pull/169.
Compared to the community's implementation, we believe our solution is more user-friendly. You just need to run the following script after installation to start training without modifying anything!
- Installation:
# We tested on the H800 computing cluster, and this version of CUDA works well.
# You can adjust this version according to the computing cluster's actual situation.
conda install nvidia/label/cuda-12.2.0::cuda
export CUDA_HOME=$CONDA_PREFIX
cd align-anything
pip install -e .[train]
# for qwen2.5-omni
pip uninstall transformers
pip install git+https://github.com/huggingface/transformers@3a1ead0aabed473eafe527915eea8c197d424356
pip install -U flash-attn --no-build-isolation
- train
cd scripts
bash qwen_omni_sft.sh
any way to fine tune it to a new voice/language?
Sure! We will support text+audio -> text modality fine-tuning soon. Stay tuned!
only fintune the thinker module?
I'm looking for video+audio -> text fine tuning. Basically to caption a shirt video
Sure! We will support text+audio -> text modality fine-tuning soon. Stay tuned!
It would be great to have support for "text+audio -> audio (only)" for conversations or call tasks. Thanks