|
|
|
0%| | 0/478 [00:00<?, ?it/s][WARNING|modeling_utils.py:1188] 2024-04-26 15:57:21,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed |
|
0%| | 2/478 [00:03<12:43, 1.60s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5%|β | 25/478 [00:32<09:39, 1.28s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10%|β | 50/478 [01:05<09:13, 1.29s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16%|ββ | 75/478 [01:37<08:37, 1.28s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21%|ββ | 100/478 [02:09<08:04, 1.28s/it][INFO|trainer.py:3614] 2024-04-26 15:59:29,412 >> ***** Running Evaluation ***** |
|
[INFO|trainer.py:3616] 2024-04-26 15:59:29,412 >> Num examples = 2000 |
|
[INFO|trainer.py:3619] 2024-04-26 15:59:29,412 >> Batch size = 8 |
|
6%|β | 2/32 [00:00<00:03, 8.89it/s] |
|
|
|
|
|
|
|
[INFO|configuration_utils.py:471] 2024-04-26 15:59:37,711 >> Configuration saved in ./checkpoint-100/config.json |
|
[INFO|configuration_utils.py:697] 2024-04-26 15:59:37,713 >> Configuration saved in ./checkpoint-100/generation_config.json |
|
{'eval_loss': 0.6759119629859924, 'eval_runtime': 8.2733, 'eval_samples_per_second': 241.742, 'eval_steps_per_second': 3.868, 'eval_rewards/chosen': 0.0017230990342795849, 'eval_rewards/rejected': -0.03281649947166443, 'eval_rewards/accuracies': 0.62890625, 'eval_rewards/margins': 0.0345395989716053, 'eval_logps/rejected': -407.8036804199219, 'eval_logps/chosen': -423.0196533203125, 'eval_logits/rejected': -3.2565112113952637, 'eval_logits/chosen': -3.313567638397217, 'epoch': 0.21} |
|
[INFO|modeling_utils.py:2598] 2024-04-26 15:59:47,330 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./checkpoint-100/model.safetensors.index.json. |
|
[INFO|tokenization_utils_base.py:2488] 2024-04-26 15:59:47,344 >> tokenizer config file saved in ./checkpoint-100/tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2497] 2024-04-26 15:59:47,384 >> Special tokens file saved in ./checkpoint-100/special_tokens_map.json |
|
[INFO|tokenization_utils_base.py:2488] 2024-04-26 16:00:07,103 >> tokenizer config file saved in ./tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2497] 2024-04-26 16:00:07,105 >> Special tokens file saved in ./special_tokens_map.json |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26%|βββ | 126/478 [03:19<07:28, 1.27s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32%|ββββ | 151/478 [03:51<06:56, 1.28s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37%|ββββ | 176/478 [04:23<06:28, 1.29s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42%|βββββ | 200/478 [04:54<05:52, 1.27s/it][INFO|trainer.py:3614] 2024-04-26 16:02:14,470 >> ***** Running Evaluation ***** |
|
[INFO|trainer.py:3616] 2024-04-26 16:02:14,470 >> Num examples = 2000 |
|
[INFO|trainer.py:3619] 2024-04-26 16:02:14,470 >> Batch size = 8 |
|
19%|ββ | 6/32 [00:01<00:06, 4.22it/s] |
|
|
|
|
|
|
|
[INFO|configuration_utils.py:471] 2024-04-26 16:02:22,770 >> Configuration saved in ./checkpoint-200/config.json |
|
[INFO|configuration_utils.py:697] 2024-04-26 16:02:22,773 >> Configuration saved in ./checkpoint-200/generation_config.json |
|
{'eval_loss': 0.6533502340316772, 'eval_runtime': 8.2763, 'eval_samples_per_second': 241.653, 'eval_steps_per_second': 3.866, 'eval_rewards/chosen': -0.06664139777421951, 'eval_rewards/rejected': -0.16173213720321655, 'eval_rewards/accuracies': 0.64453125, 'eval_rewards/margins': 0.09509073942899704, 'eval_logps/rejected': -420.6952209472656, 'eval_logps/chosen': -429.85614013671875, 'eval_logits/rejected': -3.2240023612976074, 'eval_logits/chosen': -3.2767982482910156, 'epoch': 0.42} |
|
[INFO|modeling_utils.py:2598] 2024-04-26 16:02:32,167 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./checkpoint-200/model.safetensors.index.json. |
|
[INFO|tokenization_utils_base.py:2488] 2024-04-26 16:02:32,170 >> tokenizer config file saved in ./checkpoint-200/tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2497] 2024-04-26 16:02:32,172 >> Special tokens file saved in ./checkpoint-200/special_tokens_map.json |
|
[INFO|tokenization_utils_base.py:2488] 2024-04-26 16:02:50,674 >> tokenizer config file saved in ./tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2497] 2024-04-26 16:02:50,676 >> Special tokens file saved in ./special_tokens_map.json |
|
[INFO|trainer.py:3397] 2024-04-26 16:02:50,704 >> Deleting older checkpoint [checkpoint-100] due to args.save_total_limit |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47%|βββββ | 226/478 [06:05<05:20, 1.27s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52%|ββββββ | 250/478 [06:36<05:00, 1.32s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58%|ββββββ | 275/478 [07:09<04:25, 1.31s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
63%|βββββββ | 300/478 [07:42<03:53, 1.31s/it][INFO|trainer.py:3614] 2024-04-26 16:05:02,769 >> ***** Running Evaluation ***** |
|
[INFO|trainer.py:3616] 2024-04-26 16:05:02,769 >> Num examples = 2000 |
|
[INFO|trainer.py:3619] 2024-04-26 16:05:02,769 >> Batch size = 8 |
|
12%|ββ | 4/32 [00:00<00:05, 5.09it/s] |
|
|
|
|
|
|
|
[INFO|configuration_utils.py:471] 2024-04-26 16:05:11,146 >> Configuration saved in ./checkpoint-300/config.json |
|
[INFO|configuration_utils.py:697] 2024-04-26 16:05:11,149 >> Configuration saved in ./checkpoint-300/generation_config.json |
|
{'eval_loss': 0.6438009142875671, 'eval_runtime': 8.3559, 'eval_samples_per_second': 239.351, 'eval_steps_per_second': 3.83, 'eval_rewards/chosen': -0.10771973431110382, 'eval_rewards/rejected': -0.24101632833480835, 'eval_rewards/accuracies': 0.62109375, 'eval_rewards/margins': 0.13329659402370453, 'eval_logps/rejected': -428.6236572265625, 'eval_logps/chosen': -433.9639892578125, 'eval_logits/rejected': -3.2049574851989746, 'eval_logits/chosen': -3.2553329467773438, 'epoch': 0.63} |
|
[INFO|modeling_utils.py:2598] 2024-04-26 16:05:20,618 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./checkpoint-300/model.safetensors.index.json. |
|
[INFO|tokenization_utils_base.py:2488] 2024-04-26 16:05:20,621 >> tokenizer config file saved in ./checkpoint-300/tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2497] 2024-04-26 16:05:20,623 >> Special tokens file saved in ./checkpoint-300/special_tokens_map.json |
|
[INFO|tokenization_utils_base.py:2488] 2024-04-26 16:05:39,112 >> tokenizer config file saved in ./tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2497] 2024-04-26 16:05:39,114 >> Special tokens file saved in ./special_tokens_map.json |
|
[INFO|trainer.py:3397] 2024-04-26 16:05:39,143 >> Deleting older checkpoint [checkpoint-200] due to args.save_total_limit |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68%|βββββββ | 325/478 [08:53<03:20, 1.31s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73%|ββββββββ | 351/478 [09:27<02:47, 1.32s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
79%|ββββββββ | 376/478 [10:01<02:20, 1.37s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
84%|βββββββββ | 400/478 [10:34<01:45, 1.35s/it][INFO|trainer.py:3614] 2024-04-26 16:07:54,737 >> ***** Running Evaluation ***** |
|
[INFO|trainer.py:3616] 2024-04-26 16:07:54,737 >> Num examples = 2000 |
|
[INFO|trainer.py:3619] 2024-04-26 16:07:54,737 >> Batch size = 8 |
|
12%|ββ | 4/32 [00:00<00:07, 3.96it/s] |
|
|
|
|
|
|
|
[INFO|configuration_utils.py:471] 2024-04-26 16:08:03,486 >> Configuration saved in ./checkpoint-400/config.json |
|
[INFO|configuration_utils.py:697] 2024-04-26 16:08:03,490 >> Configuration saved in ./checkpoint-400/generation_config.json |
|
{'eval_loss': 0.6415477395057678, 'eval_runtime': 8.7287, 'eval_samples_per_second': 229.13, 'eval_steps_per_second': 3.666, 'eval_rewards/chosen': -0.10007989406585693, 'eval_rewards/rejected': -0.24366310238838196, 'eval_rewards/accuracies': 0.62109375, 'eval_rewards/margins': 0.14358317852020264, 'eval_logps/rejected': -428.88836669921875, 'eval_logps/chosen': -433.20001220703125, 'eval_logits/rejected': -3.204622507095337, 'eval_logits/chosen': -3.254263401031494, 'epoch': 0.84} |
|
[INFO|modeling_utils.py:2598] 2024-04-26 16:08:13,074 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./checkpoint-400/model.safetensors.index.json. |
|
[INFO|tokenization_utils_base.py:2488] 2024-04-26 16:08:13,077 >> tokenizer config file saved in ./checkpoint-400/tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2497] 2024-04-26 16:08:13,079 >> Special tokens file saved in ./checkpoint-400/special_tokens_map.json |
|
[INFO|tokenization_utils_base.py:2488] 2024-04-26 16:08:31,839 >> tokenizer config file saved in ./tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2497] 2024-04-26 16:08:31,841 >> Special tokens file saved in ./special_tokens_map.json |
|
[INFO|trainer.py:3397] 2024-04-26 16:08:31,870 >> Deleting older checkpoint [checkpoint-300] due to args.save_total_limit |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
89%|βββββββββ | 425/478 [11:46<01:11, 1.35s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94%|ββββββββββ| 450/478 [12:20<00:37, 1.35s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99%|ββββββββββ| 475/478 [12:54<00:04, 1.35s/it] |
|
|
|
100%|ββββββββββ| 478/478 [12:58<00:00, 1.35s/it][INFO|trainer.py:2316] 2024-04-26 16:10:19,036 >> |
|
Training completed. Do not forget to share your model on huggingface.co/models =) |
|
100%|ββββββββββ| 478/478 [12:58<00:00, 1.63s/it] |
|
[INFO|trainer.py:3614] 2024-04-26 16:10:19,102 >> ***** Running Evaluation ***** |
|
[INFO|trainer.py:3616] 2024-04-26 16:10:19,102 >> Num examples = 2000 |
|
[INFO|trainer.py:3619] 2024-04-26 16:10:19,102 >> Batch size = 8 |
|
12%|ββ | 4/32 [00:00<00:05, 5.08it/s] |
|
{'train_runtime': 784.6622, 'train_samples_per_second': 77.913, 'train_steps_per_second': 0.609, 'train_loss': 0.6571792745689967, 'epoch': 1.0} |
|
***** train metrics ***** |
|
epoch = 1.0 |
|
total_flos = 0GF |
|
train_loss = 0.6572 |
|
train_runtime = 0:13:04.66 |
|
train_samples = 61135 |
|
train_samples_per_second = 77.913 |
|
train_steps_per_second = 0.609 |
|
2024-04-26 16:10:19 - INFO - __main__ - *** Training complete *** |
|
|
|
|
|
|
|
100%|ββββββββββ| 32/32 [00:08<00:00, 3.97it/s] |
|
[INFO|trainer.py:3305] 2024-04-26 16:10:27,430 >> Saving model checkpoint to ./ |
|
[INFO|configuration_utils.py:471] 2024-04-26 16:10:27,432 >> Configuration saved in ./config.json |
|
[INFO|configuration_utils.py:697] 2024-04-26 16:10:27,434 >> Configuration saved in ./generation_config.json |
|
***** eval metrics ***** |
|
epoch = 1.0 |
|
eval_logits/chosen = -3.2544 |
|
eval_logits/rejected = -3.2047 |
|
eval_logps/chosen = -433.6304 |
|
eval_logps/rejected = -429.4582 |
|
eval_loss = 0.6412 |
|
eval_rewards/accuracies = 0.6445 |
|
eval_rewards/chosen = -0.1044 |
|
eval_rewards/margins = 0.145 |
|
eval_rewards/rejected = -0.2494 |
|
eval_runtime = 0:00:08.29 |
|
eval_samples = 2000 |
|
eval_samples_per_second = 241.204 |
|
eval_steps_per_second = 3.859 |
|
2024-04-26 16:10:27 - INFO - __main__ - *** Save model *** |
|
[INFO|modeling_utils.py:2598] 2024-04-26 16:10:37,122 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./model.safetensors.index.json. |
|
[INFO|tokenization_utils_base.py:2488] 2024-04-26 16:10:37,133 >> tokenizer config file saved in ./tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2497] 2024-04-26 16:10:37,135 >> Special tokens file saved in ./special_tokens_map.json |
|
[INFO|trainer.py:3305] 2024-04-26 16:10:37,190 >> Saving model checkpoint to ./ |
|
[INFO|configuration_utils.py:471] 2024-04-26 16:10:37,192 >> Configuration saved in ./config.json |
|
[INFO|configuration_utils.py:697] 2024-04-26 16:10:37,194 >> Configuration saved in ./generation_config.json |
|
[INFO|modeling_utils.py:2598] 2024-04-26 16:10:48,100 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./model.safetensors.index.json. |
|
[INFO|tokenization_utils_base.py:2488] 2024-04-26 16:10:48,103 >> tokenizer config file saved in ./tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2497] 2024-04-26 16:10:48,105 >> Special tokens file saved in ./special_tokens_map.json |
|
[INFO|modelcard.py:450] 2024-04-26 16:10:48,202 >> Dropping the following result as it does not have all the necessary fields: |
|
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}} |
|
events.out.tfevents.1714147827.ip-26-0-160-225.711598.1: 100%|ββββββββββ| 828/828 [00:00<00:00, 5.29kB/s] |
|
events.out.tfevents.1714147034.ip-26-0-160-225.711598.0: 100%|ββββββββββ| 21.8k/21.8k [00:00<00:00, 108kB/s] |
|
model-00001-of-00002.safetensors: 1%| | 32.0M/4.99G [00:00<02:33, 32.3MB/s] |
|
events.out.tfevents.1714147034.ip-26-0-160-225.711598.0: 0%| | 0.00/21.8k [00:00<?, ?B/s] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Upload 4 LFS files: 100%|ββββββββββ| 4/4 [01:26<00:00, 21.75s/it]0:24<00:00, 57.8MB/s]:00<?, ?B/s] |
|
[INFO|modelcard.py:450] 2024-04-26 16:12:22,694 >> Dropping the following result as it does not have all the necessary fields: |
|
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}, 'dataset': {'name': 'HuggingFaceH4/ultrafeedback_binarized', 'type': 'HuggingFaceH4/ultrafeedback_binarized'}} |
|
[INFO|configuration_utils.py:471] 2024-04-26 16:12:22,700 >> Configuration saved in ./config.json |
|
[INFO|trainer.py:3305] 2024-04-26 16:12:22,704 >> Saving model checkpoint to ./ |
|
[INFO|configuration_utils.py:471] 2024-04-26 16:12:22,706 >> Configuration saved in ./config.json |
|
[INFO|configuration_utils.py:697] 2024-04-26 16:12:22,708 >> Configuration saved in ./generation_config.json |
|
2024-04-26 16:12:22 - INFO - __main__ - Model saved to ./ |
|
2024-04-26 16:12:22 - INFO - __main__ - Pushing to hub... |
|
[INFO|modeling_utils.py:2598] 2024-04-26 16:12:33,635 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./model.safetensors.index.json. |
|
[INFO|tokenization_utils_base.py:2488] 2024-04-26 16:12:33,638 >> tokenizer config file saved in ./tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2497] 2024-04-26 16:12:33,640 >> Special tokens file saved in ./special_tokens_map.json |
|
[INFO|modelcard.py:450] 2024-04-26 16:12:33,786 >> Dropping the following result as it does not have all the necessary fields: |
|
|