06/17/2024 20:24:18 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/17/2024 20:24:18 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 06/17/2024 20:24:22 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_en... 06/17/2024 20:24:27 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_zh... 06/17/2024 20:24:32 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_en... 06/17/2024 20:24:36 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json 06/17/2024 20:24:36 - INFO - transformers.configuration_utils - Model config Qwen2Config { "_name_or_path": "Qwen/Qwen2-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } 06/17/2024 20:24:36 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit. 06/17/2024 20:24:36 - INFO - transformers.modeling_utils - loading weights file model.safetensors from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/model.safetensors.index.json 06/17/2024 20:24:36 - INFO - transformers.modeling_utils - Instantiating Qwen2ForCausalLM model under default dtype torch.float16. 06/17/2024 20:24:36 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645 } 06/17/2024 20:24:37 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit. 06/17/2024 20:25:04 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing Qwen2ForCausalLM. 06/17/2024 20:25:04 - INFO - transformers.modeling_utils - All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2-7B-Instruct. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. 06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,k_proj,up_proj,gate_proj,v_proj,q_proj,down_proj 06/17/2024 20:25:04 - INFO - transformers.generation.configuration_utils - loading configuration file generation_config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/generation_config.json 06/17/2024 20:25:04 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "repetition_penalty": 1.05, "temperature": 0.7, "top_k": 20, "top_p": 0.8 } 06/17/2024 20:25:04 - INFO - llamafactory.model.loader - trainable params: 20185088 || all params: 7635801600 || trainable%: 0.2643 06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,up_proj,down_proj,q_proj,v_proj,k_proj,gate_proj 06/17/2024 20:25:05 - INFO - llamafactory.model.loader - trainable params: 20185088 || all params: 7635801600 || trainable%: 0.2643 06/17/2024 20:25:05 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. 06/17/2024 20:25:05 - INFO - transformers.trainer - Using auto half precision backend 06/17/2024 20:25:05 - INFO - transformers.trainer - ***** Running training ***** 06/17/2024 20:25:05 - INFO - transformers.trainer - Num examples = 2,000 06/17/2024 20:25:05 - INFO - transformers.trainer - Num Epochs = 3 06/17/2024 20:25:05 - INFO - transformers.trainer - Instantaneous batch size per device = 1 06/17/2024 20:25:05 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 16 06/17/2024 20:25:05 - INFO - transformers.trainer - Gradient Accumulation steps = 8 06/17/2024 20:25:05 - INFO - transformers.trainer - Total optimization steps = 375 06/17/2024 20:25:05 - INFO - transformers.trainer - Number of trainable parameters = 20,185,088 06/17/2024 20:26:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.6880, 'learning_rate': 4.9978e-05, 'epoch': 0.04, 'throughput': 807.59} 06/17/2024 20:26:54 - INFO - llamafactory.extras.callbacks - {'loss': 0.7630, 'learning_rate': 4.9912e-05, 'epoch': 0.08, 'throughput': 788.63} 06/17/2024 20:27:47 - INFO - llamafactory.extras.callbacks - {'loss': 0.6882, 'learning_rate': 4.9803e-05, 'epoch': 0.12, 'throughput': 782.46} 06/17/2024 20:28:41 - INFO - llamafactory.extras.callbacks - {'loss': 0.6951, 'learning_rate': 4.9650e-05, 'epoch': 0.16, 'throughput': 778.58} 06/17/2024 20:29:32 - INFO - llamafactory.extras.callbacks - {'loss': 0.5008, 'learning_rate': 4.9454e-05, 'epoch': 0.20, 'throughput': 776.76} 06/17/2024 20:30:18 - INFO - llamafactory.extras.callbacks - {'loss': 0.5420, 'learning_rate': 4.9215e-05, 'epoch': 0.24, 'throughput': 780.46} 06/17/2024 20:31:09 - INFO - llamafactory.extras.callbacks - {'loss': 0.5369, 'learning_rate': 4.8933e-05, 'epoch': 0.28, 'throughput': 781.27} 06/17/2024 20:32:05 - INFO - llamafactory.extras.callbacks - {'loss': 0.4948, 'learning_rate': 4.8609e-05, 'epoch': 0.32, 'throughput': 780.39} 06/17/2024 20:32:52 - INFO - llamafactory.extras.callbacks - {'loss': 0.5244, 'learning_rate': 4.8244e-05, 'epoch': 0.36, 'throughput': 778.70} 06/17/2024 20:33:37 - INFO - llamafactory.extras.callbacks - {'loss': 0.4210, 'learning_rate': 4.7839e-05, 'epoch': 0.40, 'throughput': 780.25} 06/17/2024 20:34:25 - INFO - llamafactory.extras.callbacks - {'loss': 0.4517, 'learning_rate': 4.7393e-05, 'epoch': 0.44, 'throughput': 779.83} 06/17/2024 20:35:19 - INFO - llamafactory.extras.callbacks - {'loss': 0.4661, 'learning_rate': 4.6908e-05, 'epoch': 0.48, 'throughput': 775.58} 06/17/2024 20:36:09 - INFO - llamafactory.extras.callbacks - {'loss': 0.4928, 'learning_rate': 4.6384e-05, 'epoch': 0.52, 'throughput': 775.62} 06/17/2024 20:37:00 - INFO - llamafactory.extras.callbacks - {'loss': 0.5424, 'learning_rate': 4.5823e-05, 'epoch': 0.56, 'throughput': 775.79} 06/17/2024 20:37:52 - INFO - llamafactory.extras.callbacks - {'loss': 0.5419, 'learning_rate': 4.5225e-05, 'epoch': 0.60, 'throughput': 774.15} 06/17/2024 20:38:39 - INFO - llamafactory.extras.callbacks - {'loss': 0.4558, 'learning_rate': 4.4592e-05, 'epoch': 0.64, 'throughput': 774.75} 06/17/2024 20:39:27 - INFO - llamafactory.extras.callbacks - {'loss': 0.5656, 'learning_rate': 4.3925e-05, 'epoch': 0.68, 'throughput': 776.75} 06/17/2024 20:40:18 - INFO - llamafactory.extras.callbacks - {'loss': 0.4832, 'learning_rate': 4.3224e-05, 'epoch': 0.72, 'throughput': 780.75} 06/17/2024 20:41:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.4626, 'learning_rate': 4.2492e-05, 'epoch': 0.76, 'throughput': 781.15} 06/17/2024 20:41:56 - INFO - llamafactory.extras.callbacks - {'loss': 0.4837, 'learning_rate': 4.1728e-05, 'epoch': 0.80, 'throughput': 780.33} 06/17/2024 20:41:56 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100 06/17/2024 20:41:57 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json 06/17/2024 20:41:57 - INFO - transformers.configuration_utils - Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } 06/17/2024 20:41:57 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100/tokenizer_config.json 06/17/2024 20:41:57 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100/special_tokens_map.json 06/17/2024 20:42:48 - INFO - llamafactory.extras.callbacks - {'loss': 0.5144, 'learning_rate': 4.0936e-05, 'epoch': 0.84, 'throughput': 779.83} 06/17/2024 20:43:33 - INFO - llamafactory.extras.callbacks - {'loss': 0.4930, 'learning_rate': 4.0115e-05, 'epoch': 0.88, 'throughput': 780.58} 06/17/2024 20:44:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4083, 'learning_rate': 3.9268e-05, 'epoch': 0.92, 'throughput': 781.80} 06/17/2024 20:45:14 - INFO - llamafactory.extras.callbacks - {'loss': 0.5172, 'learning_rate': 3.8396e-05, 'epoch': 0.96, 'throughput': 782.01} 06/17/2024 20:46:10 - INFO - llamafactory.extras.callbacks - {'loss': 0.5843, 'learning_rate': 3.7500e-05, 'epoch': 1.00, 'throughput': 782.22} 06/17/2024 20:46:58 - INFO - llamafactory.extras.callbacks - {'loss': 0.4567, 'learning_rate': 3.6582e-05, 'epoch': 1.04, 'throughput': 783.58} 06/17/2024 20:47:43 - INFO - llamafactory.extras.callbacks - {'loss': 0.4180, 'learning_rate': 3.5644e-05, 'epoch': 1.08, 'throughput': 784.96} 06/17/2024 20:48:37 - INFO - llamafactory.extras.callbacks - {'loss': 0.3785, 'learning_rate': 3.4688e-05, 'epoch': 1.12, 'throughput': 783.96} 06/17/2024 20:49:22 - INFO - llamafactory.extras.callbacks - {'loss': 0.4097, 'learning_rate': 3.3714e-05, 'epoch': 1.16, 'throughput': 783.11} 06/17/2024 20:50:09 - INFO - llamafactory.extras.callbacks - {'loss': 0.4507, 'learning_rate': 3.2725e-05, 'epoch': 1.20, 'throughput': 783.25} 06/17/2024 20:51:00 - INFO - llamafactory.extras.callbacks - {'loss': 0.3680, 'learning_rate': 3.1723e-05, 'epoch': 1.24, 'throughput': 782.23} 06/17/2024 20:51:53 - INFO - llamafactory.extras.callbacks - {'loss': 0.4301, 'learning_rate': 3.0709e-05, 'epoch': 1.28, 'throughput': 782.26} 06/17/2024 20:52:43 - INFO - llamafactory.extras.callbacks - {'loss': 0.4488, 'learning_rate': 2.9685e-05, 'epoch': 1.32, 'throughput': 781.89} 06/17/2024 20:53:33 - INFO - llamafactory.extras.callbacks - {'loss': 0.4075, 'learning_rate': 2.8652e-05, 'epoch': 1.36, 'throughput': 781.74} 06/17/2024 20:54:30 - INFO - llamafactory.extras.callbacks - {'loss': 0.4991, 'learning_rate': 2.7613e-05, 'epoch': 1.40, 'throughput': 781.86} 06/17/2024 20:55:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4894, 'learning_rate': 2.6570e-05, 'epoch': 1.44, 'throughput': 782.49} 06/17/2024 20:56:12 - INFO - llamafactory.extras.callbacks - {'loss': 0.4967, 'learning_rate': 2.5524e-05, 'epoch': 1.48, 'throughput': 782.06} 06/17/2024 20:57:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.5297, 'learning_rate': 2.4476e-05, 'epoch': 1.52, 'throughput': 783.03} 06/17/2024 20:57:55 - INFO - llamafactory.extras.callbacks - {'loss': 0.3939, 'learning_rate': 2.3430e-05, 'epoch': 1.56, 'throughput': 781.82} 06/17/2024 20:58:49 - INFO - llamafactory.extras.callbacks - {'loss': 0.4610, 'learning_rate': 2.2387e-05, 'epoch': 1.60, 'throughput': 781.19} 06/17/2024 20:58:49 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-200 06/17/2024 20:58:50 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json 06/17/2024 20:58:50 - INFO - transformers.configuration_utils - Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } 06/17/2024 20:58:50 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-200/tokenizer_config.json 06/17/2024 20:58:50 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-200/special_tokens_map.json 06/17/2024 20:59:48 - INFO - llamafactory.extras.callbacks - {'loss': 0.4622, 'learning_rate': 2.1348e-05, 'epoch': 1.64, 'throughput': 779.87} 06/17/2024 21:00:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.4043, 'learning_rate': 2.0315e-05, 'epoch': 1.68, 'throughput': 779.61} 06/17/2024 21:01:25 - INFO - llamafactory.extras.callbacks - {'loss': 0.4280, 'learning_rate': 1.9291e-05, 'epoch': 1.72, 'throughput': 779.45} 06/17/2024 21:02:14 - INFO - llamafactory.extras.callbacks - {'loss': 0.3779, 'learning_rate': 1.8277e-05, 'epoch': 1.76, 'throughput': 778.48} 06/17/2024 21:03:05 - INFO - llamafactory.extras.callbacks - {'loss': 0.4526, 'learning_rate': 1.7275e-05, 'epoch': 1.80, 'throughput': 779.26} 06/17/2024 21:03:56 - INFO - llamafactory.extras.callbacks - {'loss': 0.4627, 'learning_rate': 1.6286e-05, 'epoch': 1.84, 'throughput': 779.12} 06/17/2024 21:04:48 - INFO - llamafactory.extras.callbacks - {'loss': 0.4873, 'learning_rate': 1.5312e-05, 'epoch': 1.88, 'throughput': 779.09} 06/17/2024 21:05:40 - INFO - llamafactory.extras.callbacks - {'loss': 0.3234, 'learning_rate': 1.4356e-05, 'epoch': 1.92, 'throughput': 780.05} 06/17/2024 21:06:28 - INFO - llamafactory.extras.callbacks - {'loss': 0.4438, 'learning_rate': 1.3418e-05, 'epoch': 1.96, 'throughput': 780.37} 06/17/2024 21:07:21 - INFO - llamafactory.extras.callbacks - {'loss': 0.4407, 'learning_rate': 1.2500e-05, 'epoch': 2.00, 'throughput': 779.97} 06/17/2024 21:08:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.4401, 'learning_rate': 1.1604e-05, 'epoch': 2.04, 'throughput': 779.51} 06/17/2024 21:09:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.3771, 'learning_rate': 1.0732e-05, 'epoch': 2.08, 'throughput': 780.06} 06/17/2024 21:09:57 - INFO - llamafactory.extras.callbacks - {'loss': 0.4043, 'learning_rate': 9.8850e-06, 'epoch': 2.12, 'throughput': 781.03} 06/17/2024 21:10:42 - INFO - llamafactory.extras.callbacks - {'loss': 0.4018, 'learning_rate': 9.0644e-06, 'epoch': 2.16, 'throughput': 781.14} 06/17/2024 21:11:32 - INFO - llamafactory.extras.callbacks - {'loss': 0.4258, 'learning_rate': 8.2717e-06, 'epoch': 2.20, 'throughput': 781.03} 06/17/2024 21:12:19 - INFO - llamafactory.extras.callbacks - {'loss': 0.3912, 'learning_rate': 7.5084e-06, 'epoch': 2.24, 'throughput': 780.49} 06/17/2024 21:13:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.3458, 'learning_rate': 6.7758e-06, 'epoch': 2.28, 'throughput': 780.17} 06/17/2024 21:13:55 - INFO - llamafactory.extras.callbacks - {'loss': 0.4255, 'learning_rate': 6.0751e-06, 'epoch': 2.32, 'throughput': 780.22} 06/17/2024 21:14:45 - INFO - llamafactory.extras.callbacks - {'loss': 0.4222, 'learning_rate': 5.4077e-06, 'epoch': 2.36, 'throughput': 780.80} 06/17/2024 21:15:33 - INFO - llamafactory.extras.callbacks - {'loss': 0.3990, 'learning_rate': 4.7746e-06, 'epoch': 2.40, 'throughput': 780.45} 06/17/2024 21:15:33 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-300 06/17/2024 21:15:34 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json 06/17/2024 21:15:34 - INFO - transformers.configuration_utils - Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } 06/17/2024 21:15:34 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-300/tokenizer_config.json 06/17/2024 21:15:34 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-300/special_tokens_map.json 06/17/2024 21:16:26 - INFO - llamafactory.extras.callbacks - {'loss': 0.3382, 'learning_rate': 4.1770e-06, 'epoch': 2.44, 'throughput': 780.00} 06/17/2024 21:17:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4465, 'learning_rate': 3.6159e-06, 'epoch': 2.48, 'throughput': 780.28} 06/17/2024 21:18:13 - INFO - llamafactory.extras.callbacks - {'loss': 0.3250, 'learning_rate': 3.0923e-06, 'epoch': 2.52, 'throughput': 779.93} 06/17/2024 21:19:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.3920, 'learning_rate': 2.6072e-06, 'epoch': 2.56, 'throughput': 779.59} 06/17/2024 21:19:53 - INFO - llamafactory.extras.callbacks - {'loss': 0.3672, 'learning_rate': 2.1614e-06, 'epoch': 2.60, 'throughput': 779.34} 06/17/2024 21:20:43 - INFO - llamafactory.extras.callbacks - {'loss': 0.3554, 'learning_rate': 1.7556e-06, 'epoch': 2.64, 'throughput': 779.07} 06/17/2024 21:21:27 - INFO - llamafactory.extras.callbacks - {'loss': 0.3801, 'learning_rate': 1.3906e-06, 'epoch': 2.68, 'throughput': 778.76} 06/17/2024 21:22:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4350, 'learning_rate': 1.0670e-06, 'epoch': 2.72, 'throughput': 779.56} 06/17/2024 21:23:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.4063, 'learning_rate': 7.8542e-07, 'epoch': 2.76, 'throughput': 779.43} 06/17/2024 21:24:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.4894, 'learning_rate': 5.4631e-07, 'epoch': 2.80, 'throughput': 779.76} 06/17/2024 21:25:05 - INFO - llamafactory.extras.callbacks - {'loss': 0.3822, 'learning_rate': 3.5010e-07, 'epoch': 2.84, 'throughput': 779.50} 06/17/2024 21:25:57 - INFO - llamafactory.extras.callbacks - {'loss': 0.4028, 'learning_rate': 1.9713e-07, 'epoch': 2.88, 'throughput': 779.46} 06/17/2024 21:26:50 - INFO - llamafactory.extras.callbacks - {'loss': 0.4293, 'learning_rate': 8.7679e-08, 'epoch': 2.92, 'throughput': 779.71} 06/17/2024 21:27:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.4280, 'learning_rate': 2.1929e-08, 'epoch': 2.96, 'throughput': 779.90} 06/17/2024 21:28:29 - INFO - llamafactory.extras.callbacks - {'loss': 0.4766, 'learning_rate': 0.0000e+00, 'epoch': 3.00, 'throughput': 779.83} 06/17/2024 21:28:29 - INFO - transformers.trainer - Training completed. Do not forget to share your model on huggingface.co/models =) 06/17/2024 21:28:29 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05 06/17/2024 21:28:30 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json 06/17/2024 21:28:30 - INFO - transformers.configuration_utils - Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } 06/17/2024 21:28:30 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/tokenizer_config.json 06/17/2024 21:28:30 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/special_tokens_map.json 06/17/2024 21:28:30 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot. 06/17/2024 21:28:30 - INFO - transformers.modelcard - Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}