Training config: {'algorithm': <Algorithm.DPO: 'dpo'>, | |
'args': {'batch_size': 1, | |
'beta': 0.1, | |
'learning_rate': 1e-05, | |
'num_epochs': 1, | |
'output_dir': 'experiments/fd090800-77d0-4c51-aa81-68b862dc3822'}, | |
'dataset': {'eval_dataset': None, | |
'eval_split': 'test', | |
'max_prompt_length': 1024, | |
'max_seq_length': 2048, | |
'train_dataset': 'phunguyen01/test-dpo-data', | |
'train_split': 'train'}, | |
'model': {'lora_alpha': 32, | |
'lora_r': 16, | |
'model_name_or_path': 'checkpoints/Qwen2.5-0.5B-Instruct', | |
'use_peft': False}} | |
COMMAND: accelerate launch -m --config_file integration/third_party/accelerate_configs/multi_gpu.yaml --num_processes 1 integration.third_party.trl.run_dpo experiments/fd090800-77d0-4c51-aa81-68b862dc3822/training_config.yaml > experiments/fd090800-77d0-4c51-aa81-68b862dc3822/training.log 2>&1 | |
WANDB is not running, waiting for 5 seconds... | |
WANDB is not running, waiting for 5 seconds... | |
WANDB is running, updating the training job... | |