[WARNING|2024-12-17 12:49:56] logging.py:162 >> We recommend enable mixed precision training. [INFO|2024-12-17 12:49:56] parser.py:355 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.bfloat16 [INFO|2024-12-17 12:49:56] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 12:49:56] configuration_utils.py:746 >> Model config Qwen2Config { "_name_or_path": "/media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 12:49:56] tokenization_utils_base.py:2209 >> loading file vocab.json [INFO|2024-12-17 12:49:56] tokenization_utils_base.py:2209 >> loading file merges.txt [INFO|2024-12-17 12:49:56] tokenization_utils_base.py:2209 >> loading file tokenizer.json [INFO|2024-12-17 12:49:56] tokenization_utils_base.py:2209 >> loading file added_tokens.json [INFO|2024-12-17 12:49:56] tokenization_utils_base.py:2209 >> loading file special_tokens_map.json [INFO|2024-12-17 12:49:56] tokenization_utils_base.py:2209 >> loading file tokenizer_config.json [INFO|2024-12-17 12:49:57] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|2024-12-17 12:49:57] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 12:49:57] configuration_utils.py:746 >> Model config Qwen2Config { "_name_or_path": "/media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 12:49:57] tokenization_utils_base.py:2209 >> loading file vocab.json [INFO|2024-12-17 12:49:57] tokenization_utils_base.py:2209 >> loading file merges.txt [INFO|2024-12-17 12:49:57] tokenization_utils_base.py:2209 >> loading file tokenizer.json [INFO|2024-12-17 12:49:57] tokenization_utils_base.py:2209 >> loading file added_tokens.json [INFO|2024-12-17 12:49:57] tokenization_utils_base.py:2209 >> loading file special_tokens_map.json [INFO|2024-12-17 12:49:57] tokenization_utils_base.py:2209 >> loading file tokenizer_config.json [INFO|2024-12-17 12:49:57] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|2024-12-17 12:49:57] logging.py:157 >> Replace eos token: <|im_end|> [INFO|2024-12-17 12:49:57] logging.py:157 >> Loading dataset qwendatacollect.json... [INFO|2024-12-17 12:50:03] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 12:50:03] configuration_utils.py:746 >> Model config Qwen2Config { "_name_or_path": "/media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 12:50:03] modeling_utils.py:3934 >> loading weights file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/model.safetensors.index.json [INFO|2024-12-17 12:50:03] modeling_utils.py:1670 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. [INFO|2024-12-17 12:50:03] configuration_utils.py:1096 >> Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645 } [INFO|2024-12-17 12:50:06] modeling_utils.py:4800 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM. [INFO|2024-12-17 12:50:06] modeling_utils.py:4808 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. [INFO|2024-12-17 12:50:06] configuration_utils.py:1049 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/generation_config.json [INFO|2024-12-17 12:50:06] configuration_utils.py:1096 >> Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "repetition_penalty": 1.05, "temperature": 0.7, "top_k": 20, "top_p": 0.8 } [INFO|2024-12-17 12:50:06] logging.py:157 >> Gradient checkpointing enabled. [INFO|2024-12-17 12:50:06] logging.py:157 >> Using torch SDPA for faster training and inference. [INFO|2024-12-17 12:50:06] logging.py:157 >> Pure bf16 / BAdam detected, remaining trainable params in half precision. [INFO|2024-12-17 12:50:06] logging.py:157 >> Fine-tuning method: LoRA [INFO|2024-12-17 12:50:06] logging.py:157 >> Found linear modules: gate_proj,o_proj,down_proj,up_proj,v_proj,k_proj,q_proj [INFO|2024-12-17 12:50:06] logging.py:157 >> trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643 [INFO|2024-12-17 12:50:06] trainer.py:2313 >> ***** Running training ***** [INFO|2024-12-17 12:50:06] trainer.py:2314 >> Num examples = 3,354 [INFO|2024-12-17 12:50:06] trainer.py:2315 >> Num Epochs = 3 [INFO|2024-12-17 12:50:06] trainer.py:2316 >> Instantaneous batch size per device = 1 [INFO|2024-12-17 12:50:06] trainer.py:2319 >> Total train batch size (w. parallel, distributed & accumulation) = 8 [INFO|2024-12-17 12:50:06] trainer.py:2320 >> Gradient Accumulation steps = 8 [INFO|2024-12-17 12:50:06] trainer.py:2321 >> Total optimization steps = 1,257 [INFO|2024-12-17 12:50:06] trainer.py:2322 >> Number of trainable parameters = 20,185,088 [INFO|2024-12-17 12:50:27] logging.py:157 >> {'loss': 1.4421, 'learning_rate': 4.9998e-05, 'epoch': 0.01} [INFO|2024-12-17 12:50:47] logging.py:157 >> {'loss': 1.3306, 'learning_rate': 4.9992e-05, 'epoch': 0.02} [INFO|2024-12-17 12:51:07] logging.py:157 >> {'loss': 1.1873, 'learning_rate': 4.9982e-05, 'epoch': 0.04} [INFO|2024-12-17 12:51:27] logging.py:157 >> {'loss': 1.0583, 'learning_rate': 4.9969e-05, 'epoch': 0.05} [INFO|2024-12-17 12:51:48] logging.py:157 >> {'loss': 1.0045, 'learning_rate': 4.9951e-05, 'epoch': 0.06} [INFO|2024-12-17 12:52:08] logging.py:157 >> {'loss': 0.9836, 'learning_rate': 4.9930e-05, 'epoch': 0.07} [INFO|2024-12-17 12:52:28] logging.py:157 >> {'loss': 0.8925, 'learning_rate': 4.9904e-05, 'epoch': 0.08} [INFO|2024-12-17 12:52:48] logging.py:157 >> {'loss': 0.8783, 'learning_rate': 4.9875e-05, 'epoch': 0.10} [INFO|2024-12-17 12:53:08] logging.py:157 >> {'loss': 0.8681, 'learning_rate': 4.9842e-05, 'epoch': 0.11} [INFO|2024-12-17 12:53:28] logging.py:157 >> {'loss': 0.8251, 'learning_rate': 4.9805e-05, 'epoch': 0.12} [INFO|2024-12-17 12:53:48] logging.py:157 >> {'loss': 0.8079, 'learning_rate': 4.9764e-05, 'epoch': 0.13} [INFO|2024-12-17 12:54:08] logging.py:157 >> {'loss': 0.7769, 'learning_rate': 4.9719e-05, 'epoch': 0.14} [INFO|2024-12-17 12:54:29] logging.py:157 >> {'loss': 0.7770, 'learning_rate': 4.9671e-05, 'epoch': 0.16} [INFO|2024-12-17 12:54:49] logging.py:157 >> {'loss': 0.7303, 'learning_rate': 4.9618e-05, 'epoch': 0.17} [INFO|2024-12-17 12:55:09] logging.py:157 >> {'loss': 0.7442, 'learning_rate': 4.9562e-05, 'epoch': 0.18} [INFO|2024-12-17 12:55:29] logging.py:157 >> {'loss': 0.7077, 'learning_rate': 4.9502e-05, 'epoch': 0.19} [INFO|2024-12-17 12:55:49] logging.py:157 >> {'loss': 0.6730, 'learning_rate': 4.9438e-05, 'epoch': 0.20} [INFO|2024-12-17 12:56:09] logging.py:157 >> {'loss': 0.7109, 'learning_rate': 4.9370e-05, 'epoch': 0.21} [INFO|2024-12-17 12:56:29] logging.py:157 >> {'loss': 0.6798, 'learning_rate': 4.9299e-05, 'epoch': 0.23} [INFO|2024-12-17 12:56:50] logging.py:157 >> {'loss': 0.6471, 'learning_rate': 4.9223e-05, 'epoch': 0.24} [INFO|2024-12-17 12:56:50] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-100 [INFO|2024-12-17 12:56:50] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 12:56:50] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 12:56:50] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-100/tokenizer_config.json [INFO|2024-12-17 12:56:50] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-100/special_tokens_map.json [INFO|2024-12-17 12:57:10] logging.py:157 >> {'loss': 0.6456, 'learning_rate': 4.9144e-05, 'epoch': 0.25} [INFO|2024-12-17 12:57:31] logging.py:157 >> {'loss': 0.6462, 'learning_rate': 4.9061e-05, 'epoch': 0.26} [INFO|2024-12-17 12:57:51] logging.py:157 >> {'loss': 0.6594, 'learning_rate': 4.8974e-05, 'epoch': 0.27} [INFO|2024-12-17 12:58:11] logging.py:157 >> {'loss': 0.6372, 'learning_rate': 4.8884e-05, 'epoch': 0.29} [INFO|2024-12-17 12:58:31] logging.py:157 >> {'loss': 0.6594, 'learning_rate': 4.8790e-05, 'epoch': 0.30} [INFO|2024-12-17 12:58:52] logging.py:157 >> {'loss': 0.6544, 'learning_rate': 4.8692e-05, 'epoch': 0.31} [INFO|2024-12-17 12:59:12] logging.py:157 >> {'loss': 0.6275, 'learning_rate': 4.8590e-05, 'epoch': 0.32} [INFO|2024-12-17 12:59:32] logging.py:157 >> {'loss': 0.6348, 'learning_rate': 4.8485e-05, 'epoch': 0.33} [INFO|2024-12-17 12:59:53] logging.py:157 >> {'loss': 0.6045, 'learning_rate': 4.8376e-05, 'epoch': 0.35} [INFO|2024-12-17 13:00:13] logging.py:157 >> {'loss': 0.6487, 'learning_rate': 4.8264e-05, 'epoch': 0.36} [INFO|2024-12-17 13:00:33] logging.py:157 >> {'loss': 0.6026, 'learning_rate': 4.8147e-05, 'epoch': 0.37} [INFO|2024-12-17 13:00:53] logging.py:157 >> {'loss': 0.6083, 'learning_rate': 4.8028e-05, 'epoch': 0.38} [INFO|2024-12-17 13:01:14] logging.py:157 >> {'loss': 0.5957, 'learning_rate': 4.7904e-05, 'epoch': 0.39} [INFO|2024-12-17 13:01:34] logging.py:157 >> {'loss': 0.6538, 'learning_rate': 4.7777e-05, 'epoch': 0.41} [INFO|2024-12-17 13:01:54] logging.py:157 >> {'loss': 0.6200, 'learning_rate': 4.7647e-05, 'epoch': 0.42} [INFO|2024-12-17 13:02:14] logging.py:157 >> {'loss': 0.6461, 'learning_rate': 4.7513e-05, 'epoch': 0.43} [INFO|2024-12-17 13:02:35] logging.py:157 >> {'loss': 0.6330, 'learning_rate': 4.7375e-05, 'epoch': 0.44} [INFO|2024-12-17 13:02:55] logging.py:157 >> {'loss': 0.6330, 'learning_rate': 4.7234e-05, 'epoch': 0.45} [INFO|2024-12-17 13:03:15] logging.py:157 >> {'loss': 0.6082, 'learning_rate': 4.7089e-05, 'epoch': 0.47} [INFO|2024-12-17 13:03:35] logging.py:157 >> {'loss': 0.6176, 'learning_rate': 4.6941e-05, 'epoch': 0.48} [INFO|2024-12-17 13:03:35] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-200 [INFO|2024-12-17 13:03:35] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 13:03:35] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 13:03:36] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-200/tokenizer_config.json [INFO|2024-12-17 13:03:36] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-200/special_tokens_map.json [INFO|2024-12-17 13:03:56] logging.py:157 >> {'loss': 0.6168, 'learning_rate': 4.6790e-05, 'epoch': 0.49} [INFO|2024-12-17 13:04:17] logging.py:157 >> {'loss': 0.6454, 'learning_rate': 4.6635e-05, 'epoch': 0.50} [INFO|2024-12-17 13:04:37] logging.py:157 >> {'loss': 0.5738, 'learning_rate': 4.6477e-05, 'epoch': 0.51} [INFO|2024-12-17 13:04:57] logging.py:157 >> {'loss': 0.5978, 'learning_rate': 4.6315e-05, 'epoch': 0.52} [INFO|2024-12-17 13:05:17] logging.py:157 >> {'loss': 0.6375, 'learning_rate': 4.6150e-05, 'epoch': 0.54} [INFO|2024-12-17 13:05:38] logging.py:157 >> {'loss': 0.6060, 'learning_rate': 4.5982e-05, 'epoch': 0.55} [INFO|2024-12-17 13:05:58] logging.py:157 >> {'loss': 0.6261, 'learning_rate': 4.5811e-05, 'epoch': 0.56} [INFO|2024-12-17 13:06:18] logging.py:157 >> {'loss': 0.5877, 'learning_rate': 4.5636e-05, 'epoch': 0.57} [INFO|2024-12-17 13:06:38] logging.py:157 >> {'loss': 0.5821, 'learning_rate': 4.5458e-05, 'epoch': 0.58} [INFO|2024-12-17 13:06:59] logging.py:157 >> {'loss': 0.6126, 'learning_rate': 4.5277e-05, 'epoch': 0.60} [INFO|2024-12-17 13:07:19] logging.py:157 >> {'loss': 0.6034, 'learning_rate': 4.5092e-05, 'epoch': 0.61} [INFO|2024-12-17 13:07:39] logging.py:157 >> {'loss': 0.6088, 'learning_rate': 4.4905e-05, 'epoch': 0.62} [INFO|2024-12-17 13:07:59] logging.py:157 >> {'loss': 0.6067, 'learning_rate': 4.4714e-05, 'epoch': 0.63} [INFO|2024-12-17 13:08:19] logging.py:157 >> {'loss': 0.6334, 'learning_rate': 4.4521e-05, 'epoch': 0.64} [INFO|2024-12-17 13:08:39] logging.py:157 >> {'loss': 0.6093, 'learning_rate': 4.4324e-05, 'epoch': 0.66} [INFO|2024-12-17 13:09:00] logging.py:157 >> {'loss': 0.5939, 'learning_rate': 4.4124e-05, 'epoch': 0.67} [INFO|2024-12-17 13:09:20] logging.py:157 >> {'loss': 0.5483, 'learning_rate': 4.3922e-05, 'epoch': 0.68} [INFO|2024-12-17 13:09:40] logging.py:157 >> {'loss': 0.5946, 'learning_rate': 4.3716e-05, 'epoch': 0.69} [INFO|2024-12-17 13:10:00] logging.py:157 >> {'loss': 0.5651, 'learning_rate': 4.3507e-05, 'epoch': 0.70} [INFO|2024-12-17 13:10:21] logging.py:157 >> {'loss': 0.5965, 'learning_rate': 4.3296e-05, 'epoch': 0.72} [INFO|2024-12-17 13:10:21] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-300 [INFO|2024-12-17 13:10:21] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 13:10:21] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 13:10:21] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-300/tokenizer_config.json [INFO|2024-12-17 13:10:21] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-300/special_tokens_map.json [INFO|2024-12-17 13:10:42] logging.py:157 >> {'loss': 0.5751, 'learning_rate': 4.3082e-05, 'epoch': 0.73} [INFO|2024-12-17 13:11:02] logging.py:157 >> {'loss': 0.6043, 'learning_rate': 4.2864e-05, 'epoch': 0.74} [INFO|2024-12-17 13:11:22] logging.py:157 >> {'loss': 0.5948, 'learning_rate': 4.2645e-05, 'epoch': 0.75} [INFO|2024-12-17 13:11:43] logging.py:157 >> {'loss': 0.5538, 'learning_rate': 4.2422e-05, 'epoch': 0.76} [INFO|2024-12-17 13:12:03] logging.py:157 >> {'loss': 0.5696, 'learning_rate': 4.2196e-05, 'epoch': 0.78} [INFO|2024-12-17 13:12:23] logging.py:157 >> {'loss': 0.5613, 'learning_rate': 4.1968e-05, 'epoch': 0.79} [INFO|2024-12-17 13:12:43] logging.py:157 >> {'loss': 0.5712, 'learning_rate': 4.1738e-05, 'epoch': 0.80} [INFO|2024-12-17 13:13:04] logging.py:157 >> {'loss': 0.5693, 'learning_rate': 4.1504e-05, 'epoch': 0.81} [INFO|2024-12-17 13:13:24] logging.py:157 >> {'loss': 0.5911, 'learning_rate': 4.1268e-05, 'epoch': 0.82} [INFO|2024-12-17 13:13:44] logging.py:157 >> {'loss': 0.5551, 'learning_rate': 4.1030e-05, 'epoch': 0.83} [INFO|2024-12-17 13:14:05] logging.py:157 >> {'loss': 0.5640, 'learning_rate': 4.0789e-05, 'epoch': 0.85} [INFO|2024-12-17 13:14:25] logging.py:157 >> {'loss': 0.5766, 'learning_rate': 4.0545e-05, 'epoch': 0.86} [INFO|2024-12-17 13:14:45] logging.py:157 >> {'loss': 0.5289, 'learning_rate': 4.0299e-05, 'epoch': 0.87} [INFO|2024-12-17 13:15:05] logging.py:157 >> {'loss': 0.5839, 'learning_rate': 4.0051e-05, 'epoch': 0.88} [INFO|2024-12-17 13:15:25] logging.py:157 >> {'loss': 0.5830, 'learning_rate': 3.9801e-05, 'epoch': 0.89} [INFO|2024-12-17 13:15:46] logging.py:157 >> {'loss': 0.5645, 'learning_rate': 3.9548e-05, 'epoch': 0.91} [INFO|2024-12-17 13:16:06] logging.py:157 >> {'loss': 0.6013, 'learning_rate': 3.9292e-05, 'epoch': 0.92} [INFO|2024-12-17 13:16:26] logging.py:157 >> {'loss': 0.5657, 'learning_rate': 3.9035e-05, 'epoch': 0.93} [INFO|2024-12-17 13:16:46] logging.py:157 >> {'loss': 0.6077, 'learning_rate': 3.8775e-05, 'epoch': 0.94} [INFO|2024-12-17 13:17:07] logging.py:157 >> {'loss': 0.5553, 'learning_rate': 3.8514e-05, 'epoch': 0.95} [INFO|2024-12-17 13:17:07] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-400 [INFO|2024-12-17 13:17:07] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 13:17:07] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 13:17:07] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-400/tokenizer_config.json [INFO|2024-12-17 13:17:07] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-400/special_tokens_map.json [INFO|2024-12-17 13:17:28] logging.py:157 >> {'loss': 0.5806, 'learning_rate': 3.8250e-05, 'epoch': 0.97} [INFO|2024-12-17 13:17:48] logging.py:157 >> {'loss': 0.5463, 'learning_rate': 3.7984e-05, 'epoch': 0.98} [INFO|2024-12-17 13:18:08] logging.py:157 >> {'loss': 0.6032, 'learning_rate': 3.7716e-05, 'epoch': 0.99} [INFO|2024-12-17 13:18:29] logging.py:157 >> {'loss': 0.6630, 'learning_rate': 3.7446e-05, 'epoch': 1.00} [INFO|2024-12-17 13:18:49] logging.py:157 >> {'loss': 0.5532, 'learning_rate': 3.7174e-05, 'epoch': 1.01} [INFO|2024-12-17 13:19:09] logging.py:157 >> {'loss': 0.5370, 'learning_rate': 3.6900e-05, 'epoch': 1.03} [INFO|2024-12-17 13:19:29] logging.py:157 >> {'loss': 0.5192, 'learning_rate': 3.6624e-05, 'epoch': 1.04} [INFO|2024-12-17 13:19:50] logging.py:157 >> {'loss': 0.5525, 'learning_rate': 3.6347e-05, 'epoch': 1.05} [INFO|2024-12-17 13:20:10] logging.py:157 >> {'loss': 0.5546, 'learning_rate': 3.6068e-05, 'epoch': 1.06} [INFO|2024-12-17 13:20:30] logging.py:157 >> {'loss': 0.5298, 'learning_rate': 3.5787e-05, 'epoch': 1.07} [INFO|2024-12-17 13:20:50] logging.py:157 >> {'loss': 0.5354, 'learning_rate': 3.5504e-05, 'epoch': 1.09} [INFO|2024-12-17 13:21:11] logging.py:157 >> {'loss': 0.5173, 'learning_rate': 3.5220e-05, 'epoch': 1.10} [INFO|2024-12-17 13:21:31] logging.py:157 >> {'loss': 0.5424, 'learning_rate': 3.4934e-05, 'epoch': 1.11} [INFO|2024-12-17 13:21:51] logging.py:157 >> {'loss': 0.5740, 'learning_rate': 3.4646e-05, 'epoch': 1.12} [INFO|2024-12-17 13:22:11] logging.py:157 >> {'loss': 0.5152, 'learning_rate': 3.4357e-05, 'epoch': 1.13} [INFO|2024-12-17 13:22:32] logging.py:157 >> {'loss': 0.5504, 'learning_rate': 3.4067e-05, 'epoch': 1.14} [INFO|2024-12-17 13:22:52] logging.py:157 >> {'loss': 0.5547, 'learning_rate': 3.3775e-05, 'epoch': 1.16} [INFO|2024-12-17 13:23:12] logging.py:157 >> {'loss': 0.5319, 'learning_rate': 3.3482e-05, 'epoch': 1.17} [INFO|2024-12-17 13:23:32] logging.py:157 >> {'loss': 0.5541, 'learning_rate': 3.3187e-05, 'epoch': 1.18} [INFO|2024-12-17 13:23:53] logging.py:157 >> {'loss': 0.5329, 'learning_rate': 3.2892e-05, 'epoch': 1.19} [INFO|2024-12-17 13:23:53] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-500 [INFO|2024-12-17 13:23:53] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 13:23:53] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 13:23:53] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-500/tokenizer_config.json [INFO|2024-12-17 13:23:53] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-500/special_tokens_map.json [INFO|2024-12-17 13:24:14] logging.py:157 >> {'loss': 0.5964, 'learning_rate': 3.2595e-05, 'epoch': 1.20} [INFO|2024-12-17 13:24:34] logging.py:157 >> {'loss': 0.5315, 'learning_rate': 3.2296e-05, 'epoch': 1.22} [INFO|2024-12-17 13:24:54] logging.py:157 >> {'loss': 0.5698, 'learning_rate': 3.1997e-05, 'epoch': 1.23} [INFO|2024-12-17 13:25:14] logging.py:157 >> {'loss': 0.5234, 'learning_rate': 3.1697e-05, 'epoch': 1.24} [INFO|2024-12-17 13:25:35] logging.py:157 >> {'loss': 0.5386, 'learning_rate': 3.1395e-05, 'epoch': 1.25} [INFO|2024-12-17 13:25:55] logging.py:157 >> {'loss': 0.5136, 'learning_rate': 3.1092e-05, 'epoch': 1.26} [INFO|2024-12-17 13:26:15] logging.py:157 >> {'loss': 0.5341, 'learning_rate': 3.0789e-05, 'epoch': 1.28} [INFO|2024-12-17 13:26:35] logging.py:157 >> {'loss': 0.5613, 'learning_rate': 3.0485e-05, 'epoch': 1.29} [INFO|2024-12-17 13:26:56] logging.py:157 >> {'loss': 0.5076, 'learning_rate': 3.0179e-05, 'epoch': 1.30} [INFO|2024-12-17 13:27:16] logging.py:157 >> {'loss': 0.5115, 'learning_rate': 2.9873e-05, 'epoch': 1.31} [INFO|2024-12-17 13:27:36] logging.py:157 >> {'loss': 0.5467, 'learning_rate': 2.9567e-05, 'epoch': 1.32} [INFO|2024-12-17 13:27:56] logging.py:157 >> {'loss': 0.5230, 'learning_rate': 2.9259e-05, 'epoch': 1.34} [INFO|2024-12-17 13:28:17] logging.py:157 >> {'loss': 0.5541, 'learning_rate': 2.8951e-05, 'epoch': 1.35} [INFO|2024-12-17 13:28:37] logging.py:157 >> {'loss': 0.5303, 'learning_rate': 2.8642e-05, 'epoch': 1.36} [INFO|2024-12-17 13:28:57] logging.py:157 >> {'loss': 0.5440, 'learning_rate': 2.8333e-05, 'epoch': 1.37} [INFO|2024-12-17 13:29:18] logging.py:157 >> {'loss': 0.5596, 'learning_rate': 2.8023e-05, 'epoch': 1.38} [INFO|2024-12-17 13:29:38] logging.py:157 >> {'loss': 0.5003, 'learning_rate': 2.7713e-05, 'epoch': 1.40} [INFO|2024-12-17 13:29:58] logging.py:157 >> {'loss': 0.5629, 'learning_rate': 2.7402e-05, 'epoch': 1.41} [INFO|2024-12-17 13:30:18] logging.py:157 >> {'loss': 0.4516, 'learning_rate': 2.7091e-05, 'epoch': 1.42} [INFO|2024-12-17 13:30:39] logging.py:157 >> {'loss': 0.5380, 'learning_rate': 2.6779e-05, 'epoch': 1.43} [INFO|2024-12-17 13:30:39] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-600 [INFO|2024-12-17 13:30:39] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 13:30:39] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 13:30:39] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-600/tokenizer_config.json [INFO|2024-12-17 13:30:39] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-600/special_tokens_map.json [INFO|2024-12-17 13:31:00] logging.py:157 >> {'loss': 0.5314, 'learning_rate': 2.6467e-05, 'epoch': 1.44} [INFO|2024-12-17 13:31:20] logging.py:157 >> {'loss': 0.4841, 'learning_rate': 2.6156e-05, 'epoch': 1.45} [INFO|2024-12-17 13:31:40] logging.py:157 >> {'loss': 0.5133, 'learning_rate': 2.5843e-05, 'epoch': 1.47} [INFO|2024-12-17 13:32:00] logging.py:157 >> {'loss': 0.5309, 'learning_rate': 2.5531e-05, 'epoch': 1.48} [INFO|2024-12-17 13:32:21] logging.py:157 >> {'loss': 0.5022, 'learning_rate': 2.5219e-05, 'epoch': 1.49} [INFO|2024-12-17 13:32:41] logging.py:157 >> {'loss': 0.5254, 'learning_rate': 2.4906e-05, 'epoch': 1.50} [INFO|2024-12-17 13:33:01] logging.py:157 >> {'loss': 0.5507, 'learning_rate': 2.4594e-05, 'epoch': 1.51} [INFO|2024-12-17 13:33:21] logging.py:157 >> {'loss': 0.4943, 'learning_rate': 2.4282e-05, 'epoch': 1.53} [INFO|2024-12-17 13:33:42] logging.py:157 >> {'loss': 0.5111, 'learning_rate': 2.3969e-05, 'epoch': 1.54} [INFO|2024-12-17 13:34:02] logging.py:157 >> {'loss': 0.5302, 'learning_rate': 2.3657e-05, 'epoch': 1.55} [INFO|2024-12-17 13:34:22] logging.py:157 >> {'loss': 0.5299, 'learning_rate': 2.3345e-05, 'epoch': 1.56} [INFO|2024-12-17 13:34:43] logging.py:157 >> {'loss': 0.5453, 'learning_rate': 2.3034e-05, 'epoch': 1.57} [INFO|2024-12-17 13:35:03] logging.py:157 >> {'loss': 0.5804, 'learning_rate': 2.2723e-05, 'epoch': 1.59} [INFO|2024-12-17 13:35:23] logging.py:157 >> {'loss': 0.5072, 'learning_rate': 2.2412e-05, 'epoch': 1.60} [INFO|2024-12-17 13:35:43] logging.py:157 >> {'loss': 0.5287, 'learning_rate': 2.2101e-05, 'epoch': 1.61} [INFO|2024-12-17 13:36:04] logging.py:157 >> {'loss': 0.5343, 'learning_rate': 2.1791e-05, 'epoch': 1.62} [INFO|2024-12-17 13:36:24] logging.py:157 >> {'loss': 0.5329, 'learning_rate': 2.1481e-05, 'epoch': 1.63} [INFO|2024-12-17 13:36:44] logging.py:157 >> {'loss': 0.5591, 'learning_rate': 2.1172e-05, 'epoch': 1.65} [INFO|2024-12-17 13:37:04] logging.py:157 >> {'loss': 0.5254, 'learning_rate': 2.0864e-05, 'epoch': 1.66} [INFO|2024-12-17 13:37:25] logging.py:157 >> {'loss': 0.5212, 'learning_rate': 2.0556e-05, 'epoch': 1.67} [INFO|2024-12-17 13:37:25] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-700 [INFO|2024-12-17 13:37:25] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 13:37:25] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 13:37:25] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-700/tokenizer_config.json [INFO|2024-12-17 13:37:25] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-700/special_tokens_map.json [INFO|2024-12-17 13:37:46] logging.py:157 >> {'loss': 0.5276, 'learning_rate': 2.0249e-05, 'epoch': 1.68} [INFO|2024-12-17 13:38:06] logging.py:157 >> {'loss': 0.5203, 'learning_rate': 1.9943e-05, 'epoch': 1.69} [INFO|2024-12-17 13:38:26] logging.py:157 >> {'loss': 0.5291, 'learning_rate': 1.9637e-05, 'epoch': 1.71} [INFO|2024-12-17 13:38:46] logging.py:157 >> {'loss': 0.5299, 'learning_rate': 1.9333e-05, 'epoch': 1.72} [INFO|2024-12-17 13:39:07] logging.py:157 >> {'loss': 0.5476, 'learning_rate': 1.9029e-05, 'epoch': 1.73} [INFO|2024-12-17 13:39:27] logging.py:157 >> {'loss': 0.5333, 'learning_rate': 1.8726e-05, 'epoch': 1.74} [INFO|2024-12-17 13:39:47] logging.py:157 >> {'loss': 0.4962, 'learning_rate': 1.8424e-05, 'epoch': 1.75} [INFO|2024-12-17 13:40:07] logging.py:157 >> {'loss': 0.4986, 'learning_rate': 1.8123e-05, 'epoch': 1.77} [INFO|2024-12-17 13:40:28] logging.py:157 >> {'loss': 0.5461, 'learning_rate': 1.7823e-05, 'epoch': 1.78} [INFO|2024-12-17 13:40:48] logging.py:157 >> {'loss': 0.5392, 'learning_rate': 1.7525e-05, 'epoch': 1.79} [INFO|2024-12-17 13:41:08] logging.py:157 >> {'loss': 0.5162, 'learning_rate': 1.7227e-05, 'epoch': 1.80} [INFO|2024-12-17 13:41:28] logging.py:157 >> {'loss': 0.5431, 'learning_rate': 1.6931e-05, 'epoch': 1.81} [INFO|2024-12-17 13:41:49] logging.py:157 >> {'loss': 0.5037, 'learning_rate': 1.6636e-05, 'epoch': 1.82} [INFO|2024-12-17 13:42:09] logging.py:157 >> {'loss': 0.5164, 'learning_rate': 1.6342e-05, 'epoch': 1.84} [INFO|2024-12-17 13:42:29] logging.py:157 >> {'loss': 0.4950, 'learning_rate': 1.6050e-05, 'epoch': 1.85} [INFO|2024-12-17 13:42:49] logging.py:157 >> {'loss': 0.5834, 'learning_rate': 1.5759e-05, 'epoch': 1.86} [INFO|2024-12-17 13:43:10] logging.py:157 >> {'loss': 0.5459, 'learning_rate': 1.5469e-05, 'epoch': 1.87} [INFO|2024-12-17 13:43:30] logging.py:157 >> {'loss': 0.5160, 'learning_rate': 1.5181e-05, 'epoch': 1.88} [INFO|2024-12-17 13:43:50] logging.py:157 >> {'loss': 0.5519, 'learning_rate': 1.4894e-05, 'epoch': 1.90} [INFO|2024-12-17 13:44:10] logging.py:157 >> {'loss': 0.5173, 'learning_rate': 1.4609e-05, 'epoch': 1.91} [INFO|2024-12-17 13:44:10] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-800 [INFO|2024-12-17 13:44:10] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 13:44:10] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 13:44:11] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-800/tokenizer_config.json [INFO|2024-12-17 13:44:11] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-800/special_tokens_map.json [INFO|2024-12-17 13:44:31] logging.py:157 >> {'loss': 0.5170, 'learning_rate': 1.4326e-05, 'epoch': 1.92} [INFO|2024-12-17 13:44:52] logging.py:157 >> {'loss': 0.4906, 'learning_rate': 1.4044e-05, 'epoch': 1.93} [INFO|2024-12-17 13:45:12] logging.py:157 >> {'loss': 0.5598, 'learning_rate': 1.3765e-05, 'epoch': 1.94} [INFO|2024-12-17 13:45:32] logging.py:157 >> {'loss': 0.5276, 'learning_rate': 1.3486e-05, 'epoch': 1.96} [INFO|2024-12-17 13:45:52] logging.py:157 >> {'loss': 0.4966, 'learning_rate': 1.3210e-05, 'epoch': 1.97} [INFO|2024-12-17 13:46:13] logging.py:157 >> {'loss': 0.5133, 'learning_rate': 1.2935e-05, 'epoch': 1.98} [INFO|2024-12-17 13:46:33] logging.py:157 >> {'loss': 0.4814, 'learning_rate': 1.2663e-05, 'epoch': 1.99} [INFO|2024-12-17 13:46:53] logging.py:157 >> {'loss': 0.6167, 'learning_rate': 1.2392e-05, 'epoch': 2.00} [INFO|2024-12-17 13:47:13] logging.py:157 >> {'loss': 0.4915, 'learning_rate': 1.2123e-05, 'epoch': 2.02} [INFO|2024-12-17 13:47:34] logging.py:157 >> {'loss': 0.5116, 'learning_rate': 1.1856e-05, 'epoch': 2.03} [INFO|2024-12-17 13:47:54] logging.py:157 >> {'loss': 0.4754, 'learning_rate': 1.1592e-05, 'epoch': 2.04} [INFO|2024-12-17 13:48:14] logging.py:157 >> {'loss': 0.4426, 'learning_rate': 1.1329e-05, 'epoch': 2.05} [INFO|2024-12-17 13:48:34] logging.py:157 >> {'loss': 0.5026, 'learning_rate': 1.1069e-05, 'epoch': 2.06} [INFO|2024-12-17 13:48:54] logging.py:157 >> {'loss': 0.4872, 'learning_rate': 1.0810e-05, 'epoch': 2.08} [INFO|2024-12-17 13:49:15] logging.py:157 >> {'loss': 0.5022, 'learning_rate': 1.0554e-05, 'epoch': 2.09} [INFO|2024-12-17 13:49:35] logging.py:157 >> {'loss': 0.5388, 'learning_rate': 1.0300e-05, 'epoch': 2.10} [INFO|2024-12-17 13:49:55] logging.py:157 >> {'loss': 0.4810, 'learning_rate': 1.0049e-05, 'epoch': 2.11} [INFO|2024-12-17 13:50:15] logging.py:157 >> {'loss': 0.4829, 'learning_rate': 9.7996e-06, 'epoch': 2.12} [INFO|2024-12-17 13:50:36] logging.py:157 >> {'loss': 0.4902, 'learning_rate': 9.5527e-06, 'epoch': 2.13} [INFO|2024-12-17 13:50:56] logging.py:157 >> {'loss': 0.5297, 'learning_rate': 9.3083e-06, 'epoch': 2.15} [INFO|2024-12-17 13:50:56] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-900 [INFO|2024-12-17 13:50:56] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 13:50:56] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 13:50:56] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-900/tokenizer_config.json [INFO|2024-12-17 13:50:56] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-900/special_tokens_map.json [INFO|2024-12-17 13:51:17] logging.py:157 >> {'loss': 0.4845, 'learning_rate': 9.0663e-06, 'epoch': 2.16} [INFO|2024-12-17 13:51:37] logging.py:157 >> {'loss': 0.5086, 'learning_rate': 8.8268e-06, 'epoch': 2.17} [INFO|2024-12-17 13:51:57] logging.py:157 >> {'loss': 0.5136, 'learning_rate': 8.5899e-06, 'epoch': 2.18} [INFO|2024-12-17 13:52:18] logging.py:157 >> {'loss': 0.5137, 'learning_rate': 8.3555e-06, 'epoch': 2.19} [INFO|2024-12-17 13:52:38] logging.py:157 >> {'loss': 0.5032, 'learning_rate': 8.1237e-06, 'epoch': 2.21} [INFO|2024-12-17 13:52:58] logging.py:157 >> {'loss': 0.4658, 'learning_rate': 7.8945e-06, 'epoch': 2.22} [INFO|2024-12-17 13:53:18] logging.py:157 >> {'loss': 0.5261, 'learning_rate': 7.6680e-06, 'epoch': 2.23} [INFO|2024-12-17 13:53:39] logging.py:157 >> {'loss': 0.5216, 'learning_rate': 7.4442e-06, 'epoch': 2.24} [INFO|2024-12-17 13:53:59] logging.py:157 >> {'loss': 0.4949, 'learning_rate': 7.2232e-06, 'epoch': 2.25} [INFO|2024-12-17 13:54:19] logging.py:157 >> {'loss': 0.4640, 'learning_rate': 7.0049e-06, 'epoch': 2.27} [INFO|2024-12-17 13:54:39] logging.py:157 >> {'loss': 0.5057, 'learning_rate': 6.7895e-06, 'epoch': 2.28} [INFO|2024-12-17 13:54:59] logging.py:157 >> {'loss': 0.4875, 'learning_rate': 6.5769e-06, 'epoch': 2.29} [INFO|2024-12-17 13:55:20] logging.py:157 >> {'loss': 0.4753, 'learning_rate': 6.3671e-06, 'epoch': 2.30} [INFO|2024-12-17 13:55:40] logging.py:157 >> {'loss': 0.4725, 'learning_rate': 6.1603e-06, 'epoch': 2.31} [INFO|2024-12-17 13:56:00] logging.py:157 >> {'loss': 0.5095, 'learning_rate': 5.9564e-06, 'epoch': 2.33} [INFO|2024-12-17 13:56:20] logging.py:157 >> {'loss': 0.5099, 'learning_rate': 5.7555e-06, 'epoch': 2.34} [INFO|2024-12-17 13:56:41] logging.py:157 >> {'loss': 0.4744, 'learning_rate': 5.5576e-06, 'epoch': 2.35} [INFO|2024-12-17 13:57:01] logging.py:157 >> {'loss': 0.5126, 'learning_rate': 5.3627e-06, 'epoch': 2.36} [INFO|2024-12-17 13:57:21] logging.py:157 >> {'loss': 0.4636, 'learning_rate': 5.1709e-06, 'epoch': 2.37} [INFO|2024-12-17 13:57:41] logging.py:157 >> {'loss': 0.4817, 'learning_rate': 4.9822e-06, 'epoch': 2.39} [INFO|2024-12-17 13:57:41] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1000 [INFO|2024-12-17 13:57:41] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 13:57:41] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 13:57:41] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1000/tokenizer_config.json [INFO|2024-12-17 13:57:41] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1000/special_tokens_map.json [INFO|2024-12-17 13:58:02] logging.py:157 >> {'loss': 0.5238, 'learning_rate': 4.7966e-06, 'epoch': 2.40} [INFO|2024-12-17 13:58:22] logging.py:157 >> {'loss': 0.4462, 'learning_rate': 4.6142e-06, 'epoch': 2.41} [INFO|2024-12-17 13:58:43] logging.py:157 >> {'loss': 0.4730, 'learning_rate': 4.4350e-06, 'epoch': 2.42} [INFO|2024-12-17 13:59:03] logging.py:157 >> {'loss': 0.5256, 'learning_rate': 4.2589e-06, 'epoch': 2.43} [INFO|2024-12-17 13:59:23] logging.py:157 >> {'loss': 0.4561, 'learning_rate': 4.0861e-06, 'epoch': 2.44} [INFO|2024-12-17 13:59:43] logging.py:157 >> {'loss': 0.5378, 'learning_rate': 3.9166e-06, 'epoch': 2.46} [INFO|2024-12-17 14:00:04] logging.py:157 >> {'loss': 0.5014, 'learning_rate': 3.7504e-06, 'epoch': 2.47} [INFO|2024-12-17 14:00:24] logging.py:157 >> {'loss': 0.4967, 'learning_rate': 3.5875e-06, 'epoch': 2.48} [INFO|2024-12-17 14:00:44] logging.py:157 >> {'loss': 0.4403, 'learning_rate': 3.4279e-06, 'epoch': 2.49} [INFO|2024-12-17 14:01:04] logging.py:157 >> {'loss': 0.5167, 'learning_rate': 3.2717e-06, 'epoch': 2.50} [INFO|2024-12-17 14:01:25] logging.py:157 >> {'loss': 0.4748, 'learning_rate': 3.1189e-06, 'epoch': 2.52} [INFO|2024-12-17 14:01:45] logging.py:157 >> {'loss': 0.5000, 'learning_rate': 2.9695e-06, 'epoch': 2.53} [INFO|2024-12-17 14:02:05] logging.py:157 >> {'loss': 0.4844, 'learning_rate': 2.8235e-06, 'epoch': 2.54} [INFO|2024-12-17 14:02:25] logging.py:157 >> {'loss': 0.4588, 'learning_rate': 2.6810e-06, 'epoch': 2.55} [INFO|2024-12-17 14:02:45] logging.py:157 >> {'loss': 0.4561, 'learning_rate': 2.5420e-06, 'epoch': 2.56} [INFO|2024-12-17 14:03:06] logging.py:157 >> {'loss': 0.4869, 'learning_rate': 2.4065e-06, 'epoch': 2.58} [INFO|2024-12-17 14:03:26] logging.py:157 >> {'loss': 0.4966, 'learning_rate': 2.2746e-06, 'epoch': 2.59} [INFO|2024-12-17 14:03:46] logging.py:157 >> {'loss': 0.4629, 'learning_rate': 2.1461e-06, 'epoch': 2.60} [INFO|2024-12-17 14:04:06] logging.py:157 >> {'loss': 0.5041, 'learning_rate': 2.0213e-06, 'epoch': 2.61} [INFO|2024-12-17 14:04:27] logging.py:157 >> {'loss': 0.5230, 'learning_rate': 1.9000e-06, 'epoch': 2.62} [INFO|2024-12-17 14:04:27] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1100 [INFO|2024-12-17 14:04:27] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 14:04:27] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 14:04:27] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1100/tokenizer_config.json [INFO|2024-12-17 14:04:27] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1100/special_tokens_map.json [INFO|2024-12-17 14:04:47] logging.py:157 >> {'loss': 0.4703, 'learning_rate': 1.7824e-06, 'epoch': 2.64} [INFO|2024-12-17 14:05:08] logging.py:157 >> {'loss': 0.4987, 'learning_rate': 1.6683e-06, 'epoch': 2.65} [INFO|2024-12-17 14:05:28] logging.py:157 >> {'loss': 0.5113, 'learning_rate': 1.5579e-06, 'epoch': 2.66} [INFO|2024-12-17 14:05:48] logging.py:157 >> {'loss': 0.4720, 'learning_rate': 1.4512e-06, 'epoch': 2.67} [INFO|2024-12-17 14:06:08] logging.py:157 >> {'loss': 0.5110, 'learning_rate': 1.3482e-06, 'epoch': 2.68} [INFO|2024-12-17 14:06:28] logging.py:157 >> {'loss': 0.5292, 'learning_rate': 1.2488e-06, 'epoch': 2.70} [INFO|2024-12-17 14:06:49] logging.py:157 >> {'loss': 0.4581, 'learning_rate': 1.1532e-06, 'epoch': 2.71} [INFO|2024-12-17 14:07:09] logging.py:157 >> {'loss': 0.5219, 'learning_rate': 1.0612e-06, 'epoch': 2.72} [INFO|2024-12-17 14:07:29] logging.py:157 >> {'loss': 0.4821, 'learning_rate': 9.7306e-07, 'epoch': 2.73} [INFO|2024-12-17 14:07:49] logging.py:157 >> {'loss': 0.4483, 'learning_rate': 8.8862e-07, 'epoch': 2.74} [INFO|2024-12-17 14:08:10] logging.py:157 >> {'loss': 0.5451, 'learning_rate': 8.0795e-07, 'epoch': 2.75} [INFO|2024-12-17 14:08:30] logging.py:157 >> {'loss': 0.4644, 'learning_rate': 7.3106e-07, 'epoch': 2.77} [INFO|2024-12-17 14:08:50] logging.py:157 >> {'loss': 0.5294, 'learning_rate': 6.5796e-07, 'epoch': 2.78} [INFO|2024-12-17 14:09:10] logging.py:157 >> {'loss': 0.4743, 'learning_rate': 5.8866e-07, 'epoch': 2.79} [INFO|2024-12-17 14:09:30] logging.py:157 >> {'loss': 0.5137, 'learning_rate': 5.2317e-07, 'epoch': 2.80} [INFO|2024-12-17 14:09:51] logging.py:157 >> {'loss': 0.4973, 'learning_rate': 4.6151e-07, 'epoch': 2.81} [INFO|2024-12-17 14:10:11] logging.py:157 >> {'loss': 0.4836, 'learning_rate': 4.0368e-07, 'epoch': 2.83} [INFO|2024-12-17 14:10:31] logging.py:157 >> {'loss': 0.5003, 'learning_rate': 3.4968e-07, 'epoch': 2.84} [INFO|2024-12-17 14:10:51] logging.py:157 >> {'loss': 0.4861, 'learning_rate': 2.9954e-07, 'epoch': 2.85} [INFO|2024-12-17 14:11:12] logging.py:157 >> {'loss': 0.4464, 'learning_rate': 2.5325e-07, 'epoch': 2.86} [INFO|2024-12-17 14:11:12] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1200 [INFO|2024-12-17 14:11:12] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 14:11:12] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 14:11:12] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1200/tokenizer_config.json [INFO|2024-12-17 14:11:12] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1200/special_tokens_map.json [INFO|2024-12-17 14:11:33] logging.py:157 >> {'loss': 0.5013, 'learning_rate': 2.1083e-07, 'epoch': 2.87} [INFO|2024-12-17 14:11:53] logging.py:157 >> {'loss': 0.4679, 'learning_rate': 1.7228e-07, 'epoch': 2.89} [INFO|2024-12-17 14:12:13] logging.py:157 >> {'loss': 0.4972, 'learning_rate': 1.3761e-07, 'epoch': 2.90} [INFO|2024-12-17 14:12:33] logging.py:157 >> {'loss': 0.4953, 'learning_rate': 1.0682e-07, 'epoch': 2.91} [INFO|2024-12-17 14:12:54] logging.py:157 >> {'loss': 0.4842, 'learning_rate': 7.9911e-08, 'epoch': 2.92} [INFO|2024-12-17 14:13:14] logging.py:157 >> {'loss': 0.4684, 'learning_rate': 5.6899e-08, 'epoch': 2.93} [INFO|2024-12-17 14:13:34] logging.py:157 >> {'loss': 0.4477, 'learning_rate': 3.7781e-08, 'epoch': 2.95} [INFO|2024-12-17 14:13:54] logging.py:157 >> {'loss': 0.4787, 'learning_rate': 2.2562e-08, 'epoch': 2.96} [INFO|2024-12-17 14:14:15] logging.py:157 >> {'loss': 0.5320, 'learning_rate': 1.1243e-08, 'epoch': 2.97} [INFO|2024-12-17 14:14:35] logging.py:157 >> {'loss': 0.5079, 'learning_rate': 3.8258e-09, 'epoch': 2.98} [INFO|2024-12-17 14:14:55] logging.py:157 >> {'loss': 0.4780, 'learning_rate': 3.1232e-10, 'epoch': 2.99} [INFO|2024-12-17 14:15:03] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1257 [INFO|2024-12-17 14:15:03] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 14:15:03] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 14:15:03] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1257/tokenizer_config.json [INFO|2024-12-17 14:15:03] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/checkpoint-1257/special_tokens_map.json [INFO|2024-12-17 14:15:04] trainer.py:2584 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|2024-12-17 14:15:04] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28 [INFO|2024-12-17 14:15:04] configuration_utils.py:677 >> loading configuration file /media/omnisky/Extreme SSD/hzq/LLMmodels/Qwen2-7B-Instruct/config.json [INFO|2024-12-17 14:15:04] configuration_utils.py:746 >> Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.46.1", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO|2024-12-17 14:15:04] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/tokenizer_config.json [INFO|2024-12-17 14:15:04] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2-7B-Instruct/lora/train_2024-12-17-12-48-28/special_tokens_map.json [WARNING|2024-12-17 14:15:04] logging.py:162 >> No metric eval_loss to plot. [WARNING|2024-12-17 14:15:04] logging.py:162 >> No metric eval_accuracy to plot. [INFO|2024-12-17 14:15:04] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}