quiin-dpo-test / api.log
phunguyen01's picture
Training in progress, epoch 1
36def42 verified
Training config: {'algorithm': <Algorithm.DPO: 'dpo'>,
'args': {'batch_size': 1,
'beta': 0.1,
'learning_rate': 1e-05,
'num_epochs': 1,
'output_dir': 'experiments/fd090800-77d0-4c51-aa81-68b862dc3822'},
'dataset': {'eval_dataset': None,
'eval_split': 'test',
'max_prompt_length': 1024,
'max_seq_length': 2048,
'train_dataset': 'phunguyen01/test-dpo-data',
'train_split': 'train'},
'model': {'lora_alpha': 32,
'lora_r': 16,
'model_name_or_path': 'checkpoints/Qwen2.5-0.5B-Instruct',
'use_peft': False}}
COMMAND: accelerate launch -m --config_file integration/third_party/accelerate_configs/multi_gpu.yaml --num_processes 1 integration.third_party.trl.run_dpo experiments/fd090800-77d0-4c51-aa81-68b862dc3822/training_config.yaml > experiments/fd090800-77d0-4c51-aa81-68b862dc3822/training.log 2>&1
WANDB is not running, waiting for 5 seconds...
WANDB is not running, waiting for 5 seconds...
WANDB is running, updating the training job...