06/17/2024 20:24:18 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

06/17/2024 20:24:18 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>

06/17/2024 20:24:22 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_en...

06/17/2024 20:24:27 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_zh...

06/17/2024 20:24:32 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_en...

06/17/2024 20:24:36 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json

06/17/2024 20:24:36 - INFO - transformers.configuration_utils - Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2-7B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


06/17/2024 20:24:36 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit.

06/17/2024 20:24:36 - INFO - transformers.modeling_utils - loading weights file model.safetensors from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/model.safetensors.index.json

06/17/2024 20:24:36 - INFO - transformers.modeling_utils - Instantiating Qwen2ForCausalLM model under default dtype torch.float16.

06/17/2024 20:24:36 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645
}


06/17/2024 20:24:37 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit.

06/17/2024 20:25:04 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing Qwen2ForCausalLM.


06/17/2024 20:25:04 - INFO - transformers.modeling_utils - All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2-7B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.

06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,k_proj,up_proj,gate_proj,v_proj,q_proj,down_proj

06/17/2024 20:25:04 - INFO - transformers.generation.configuration_utils - loading configuration file generation_config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/generation_config.json

06/17/2024 20:25:04 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.05,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}


06/17/2024 20:25:04 - INFO - llamafactory.model.loader - trainable params: 20185088 || all params: 7635801600 || trainable%: 0.2643

06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

06/17/2024 20:25:04 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

06/17/2024 20:25:04 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,up_proj,down_proj,q_proj,v_proj,k_proj,gate_proj

06/17/2024 20:25:05 - INFO - llamafactory.model.loader - trainable params: 20185088 || all params: 7635801600 || trainable%: 0.2643

06/17/2024 20:25:05 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

06/17/2024 20:25:05 - INFO - transformers.trainer - Using auto half precision backend

06/17/2024 20:25:05 - INFO - transformers.trainer - ***** Running training *****

06/17/2024 20:25:05 - INFO - transformers.trainer -   Num examples = 2,000

06/17/2024 20:25:05 - INFO - transformers.trainer -   Num Epochs = 3

06/17/2024 20:25:05 - INFO - transformers.trainer -   Instantaneous batch size per device = 1

06/17/2024 20:25:05 - INFO - transformers.trainer -   Total train batch size (w. parallel, distributed & accumulation) = 16

06/17/2024 20:25:05 - INFO - transformers.trainer -   Gradient Accumulation steps = 8

06/17/2024 20:25:05 - INFO - transformers.trainer -   Total optimization steps = 375

06/17/2024 20:25:05 - INFO - transformers.trainer -   Number of trainable parameters = 20,185,088

06/17/2024 20:26:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.6880, 'learning_rate': 4.9978e-05, 'epoch': 0.04, 'throughput': 807.59}

06/17/2024 20:26:54 - INFO - llamafactory.extras.callbacks - {'loss': 0.7630, 'learning_rate': 4.9912e-05, 'epoch': 0.08, 'throughput': 788.63}

06/17/2024 20:27:47 - INFO - llamafactory.extras.callbacks - {'loss': 0.6882, 'learning_rate': 4.9803e-05, 'epoch': 0.12, 'throughput': 782.46}

06/17/2024 20:28:41 - INFO - llamafactory.extras.callbacks - {'loss': 0.6951, 'learning_rate': 4.9650e-05, 'epoch': 0.16, 'throughput': 778.58}

06/17/2024 20:29:32 - INFO - llamafactory.extras.callbacks - {'loss': 0.5008, 'learning_rate': 4.9454e-05, 'epoch': 0.20, 'throughput': 776.76}

06/17/2024 20:30:18 - INFO - llamafactory.extras.callbacks - {'loss': 0.5420, 'learning_rate': 4.9215e-05, 'epoch': 0.24, 'throughput': 780.46}

06/17/2024 20:31:09 - INFO - llamafactory.extras.callbacks - {'loss': 0.5369, 'learning_rate': 4.8933e-05, 'epoch': 0.28, 'throughput': 781.27}

06/17/2024 20:32:05 - INFO - llamafactory.extras.callbacks - {'loss': 0.4948, 'learning_rate': 4.8609e-05, 'epoch': 0.32, 'throughput': 780.39}

06/17/2024 20:32:52 - INFO - llamafactory.extras.callbacks - {'loss': 0.5244, 'learning_rate': 4.8244e-05, 'epoch': 0.36, 'throughput': 778.70}

06/17/2024 20:33:37 - INFO - llamafactory.extras.callbacks - {'loss': 0.4210, 'learning_rate': 4.7839e-05, 'epoch': 0.40, 'throughput': 780.25}

06/17/2024 20:34:25 - INFO - llamafactory.extras.callbacks - {'loss': 0.4517, 'learning_rate': 4.7393e-05, 'epoch': 0.44, 'throughput': 779.83}

06/17/2024 20:35:19 - INFO - llamafactory.extras.callbacks - {'loss': 0.4661, 'learning_rate': 4.6908e-05, 'epoch': 0.48, 'throughput': 775.58}

06/17/2024 20:36:09 - INFO - llamafactory.extras.callbacks - {'loss': 0.4928, 'learning_rate': 4.6384e-05, 'epoch': 0.52, 'throughput': 775.62}

06/17/2024 20:37:00 - INFO - llamafactory.extras.callbacks - {'loss': 0.5424, 'learning_rate': 4.5823e-05, 'epoch': 0.56, 'throughput': 775.79}

06/17/2024 20:37:52 - INFO - llamafactory.extras.callbacks - {'loss': 0.5419, 'learning_rate': 4.5225e-05, 'epoch': 0.60, 'throughput': 774.15}

06/17/2024 20:38:39 - INFO - llamafactory.extras.callbacks - {'loss': 0.4558, 'learning_rate': 4.4592e-05, 'epoch': 0.64, 'throughput': 774.75}

06/17/2024 20:39:27 - INFO - llamafactory.extras.callbacks - {'loss': 0.5656, 'learning_rate': 4.3925e-05, 'epoch': 0.68, 'throughput': 776.75}

06/17/2024 20:40:18 - INFO - llamafactory.extras.callbacks - {'loss': 0.4832, 'learning_rate': 4.3224e-05, 'epoch': 0.72, 'throughput': 780.75}

06/17/2024 20:41:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.4626, 'learning_rate': 4.2492e-05, 'epoch': 0.76, 'throughput': 781.15}

06/17/2024 20:41:56 - INFO - llamafactory.extras.callbacks - {'loss': 0.4837, 'learning_rate': 4.1728e-05, 'epoch': 0.80, 'throughput': 780.33}

06/17/2024 20:41:56 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100

06/17/2024 20:41:57 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json

06/17/2024 20:41:57 - INFO - transformers.configuration_utils - Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


06/17/2024 20:41:57 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100/tokenizer_config.json

06/17/2024 20:41:57 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100/special_tokens_map.json

06/17/2024 20:42:48 - INFO - llamafactory.extras.callbacks - {'loss': 0.5144, 'learning_rate': 4.0936e-05, 'epoch': 0.84, 'throughput': 779.83}

06/17/2024 20:43:33 - INFO - llamafactory.extras.callbacks - {'loss': 0.4930, 'learning_rate': 4.0115e-05, 'epoch': 0.88, 'throughput': 780.58}

06/17/2024 20:44:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4083, 'learning_rate': 3.9268e-05, 'epoch': 0.92, 'throughput': 781.80}

06/17/2024 20:45:14 - INFO - llamafactory.extras.callbacks - {'loss': 0.5172, 'learning_rate': 3.8396e-05, 'epoch': 0.96, 'throughput': 782.01}

06/17/2024 20:46:10 - INFO - llamafactory.extras.callbacks - {'loss': 0.5843, 'learning_rate': 3.7500e-05, 'epoch': 1.00, 'throughput': 782.22}

06/17/2024 20:46:58 - INFO - llamafactory.extras.callbacks - {'loss': 0.4567, 'learning_rate': 3.6582e-05, 'epoch': 1.04, 'throughput': 783.58}

06/17/2024 20:47:43 - INFO - llamafactory.extras.callbacks - {'loss': 0.4180, 'learning_rate': 3.5644e-05, 'epoch': 1.08, 'throughput': 784.96}

06/17/2024 20:48:37 - INFO - llamafactory.extras.callbacks - {'loss': 0.3785, 'learning_rate': 3.4688e-05, 'epoch': 1.12, 'throughput': 783.96}

06/17/2024 20:49:22 - INFO - llamafactory.extras.callbacks - {'loss': 0.4097, 'learning_rate': 3.3714e-05, 'epoch': 1.16, 'throughput': 783.11}

06/17/2024 20:50:09 - INFO - llamafactory.extras.callbacks - {'loss': 0.4507, 'learning_rate': 3.2725e-05, 'epoch': 1.20, 'throughput': 783.25}

06/17/2024 20:51:00 - INFO - llamafactory.extras.callbacks - {'loss': 0.3680, 'learning_rate': 3.1723e-05, 'epoch': 1.24, 'throughput': 782.23}

06/17/2024 20:51:53 - INFO - llamafactory.extras.callbacks - {'loss': 0.4301, 'learning_rate': 3.0709e-05, 'epoch': 1.28, 'throughput': 782.26}

06/17/2024 20:52:43 - INFO - llamafactory.extras.callbacks - {'loss': 0.4488, 'learning_rate': 2.9685e-05, 'epoch': 1.32, 'throughput': 781.89}

06/17/2024 20:53:33 - INFO - llamafactory.extras.callbacks - {'loss': 0.4075, 'learning_rate': 2.8652e-05, 'epoch': 1.36, 'throughput': 781.74}

06/17/2024 20:54:30 - INFO - llamafactory.extras.callbacks - {'loss': 0.4991, 'learning_rate': 2.7613e-05, 'epoch': 1.40, 'throughput': 781.86}

06/17/2024 20:55:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4894, 'learning_rate': 2.6570e-05, 'epoch': 1.44, 'throughput': 782.49}

06/17/2024 20:56:12 - INFO - llamafactory.extras.callbacks - {'loss': 0.4967, 'learning_rate': 2.5524e-05, 'epoch': 1.48, 'throughput': 782.06}

06/17/2024 20:57:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.5297, 'learning_rate': 2.4476e-05, 'epoch': 1.52, 'throughput': 783.03}

06/17/2024 20:57:55 - INFO - llamafactory.extras.callbacks - {'loss': 0.3939, 'learning_rate': 2.3430e-05, 'epoch': 1.56, 'throughput': 781.82}

06/17/2024 20:58:49 - INFO - llamafactory.extras.callbacks - {'loss': 0.4610, 'learning_rate': 2.2387e-05, 'epoch': 1.60, 'throughput': 781.19}

06/17/2024 20:58:49 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-200

06/17/2024 20:58:50 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json

06/17/2024 20:58:50 - INFO - transformers.configuration_utils - Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


06/17/2024 20:58:50 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-200/tokenizer_config.json

06/17/2024 20:58:50 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-200/special_tokens_map.json

06/17/2024 20:59:48 - INFO - llamafactory.extras.callbacks - {'loss': 0.4622, 'learning_rate': 2.1348e-05, 'epoch': 1.64, 'throughput': 779.87}

06/17/2024 21:00:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.4043, 'learning_rate': 2.0315e-05, 'epoch': 1.68, 'throughput': 779.61}

06/17/2024 21:01:25 - INFO - llamafactory.extras.callbacks - {'loss': 0.4280, 'learning_rate': 1.9291e-05, 'epoch': 1.72, 'throughput': 779.45}

06/17/2024 21:02:14 - INFO - llamafactory.extras.callbacks - {'loss': 0.3779, 'learning_rate': 1.8277e-05, 'epoch': 1.76, 'throughput': 778.48}

06/17/2024 21:03:05 - INFO - llamafactory.extras.callbacks - {'loss': 0.4526, 'learning_rate': 1.7275e-05, 'epoch': 1.80, 'throughput': 779.26}

06/17/2024 21:03:56 - INFO - llamafactory.extras.callbacks - {'loss': 0.4627, 'learning_rate': 1.6286e-05, 'epoch': 1.84, 'throughput': 779.12}

06/17/2024 21:04:48 - INFO - llamafactory.extras.callbacks - {'loss': 0.4873, 'learning_rate': 1.5312e-05, 'epoch': 1.88, 'throughput': 779.09}

06/17/2024 21:05:40 - INFO - llamafactory.extras.callbacks - {'loss': 0.3234, 'learning_rate': 1.4356e-05, 'epoch': 1.92, 'throughput': 780.05}

06/17/2024 21:06:28 - INFO - llamafactory.extras.callbacks - {'loss': 0.4438, 'learning_rate': 1.3418e-05, 'epoch': 1.96, 'throughput': 780.37}

06/17/2024 21:07:21 - INFO - llamafactory.extras.callbacks - {'loss': 0.4407, 'learning_rate': 1.2500e-05, 'epoch': 2.00, 'throughput': 779.97}

06/17/2024 21:08:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.4401, 'learning_rate': 1.1604e-05, 'epoch': 2.04, 'throughput': 779.51}

06/17/2024 21:09:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.3771, 'learning_rate': 1.0732e-05, 'epoch': 2.08, 'throughput': 780.06}

06/17/2024 21:09:57 - INFO - llamafactory.extras.callbacks - {'loss': 0.4043, 'learning_rate': 9.8850e-06, 'epoch': 2.12, 'throughput': 781.03}

06/17/2024 21:10:42 - INFO - llamafactory.extras.callbacks - {'loss': 0.4018, 'learning_rate': 9.0644e-06, 'epoch': 2.16, 'throughput': 781.14}

06/17/2024 21:11:32 - INFO - llamafactory.extras.callbacks - {'loss': 0.4258, 'learning_rate': 8.2717e-06, 'epoch': 2.20, 'throughput': 781.03}

06/17/2024 21:12:19 - INFO - llamafactory.extras.callbacks - {'loss': 0.3912, 'learning_rate': 7.5084e-06, 'epoch': 2.24, 'throughput': 780.49}

06/17/2024 21:13:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.3458, 'learning_rate': 6.7758e-06, 'epoch': 2.28, 'throughput': 780.17}

06/17/2024 21:13:55 - INFO - llamafactory.extras.callbacks - {'loss': 0.4255, 'learning_rate': 6.0751e-06, 'epoch': 2.32, 'throughput': 780.22}

06/17/2024 21:14:45 - INFO - llamafactory.extras.callbacks - {'loss': 0.4222, 'learning_rate': 5.4077e-06, 'epoch': 2.36, 'throughput': 780.80}

06/17/2024 21:15:33 - INFO - llamafactory.extras.callbacks - {'loss': 0.3990, 'learning_rate': 4.7746e-06, 'epoch': 2.40, 'throughput': 780.45}

06/17/2024 21:15:33 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-300

06/17/2024 21:15:34 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json

06/17/2024 21:15:34 - INFO - transformers.configuration_utils - Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


06/17/2024 21:15:34 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-300/tokenizer_config.json

06/17/2024 21:15:34 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-300/special_tokens_map.json

06/17/2024 21:16:26 - INFO - llamafactory.extras.callbacks - {'loss': 0.3382, 'learning_rate': 4.1770e-06, 'epoch': 2.44, 'throughput': 780.00}

06/17/2024 21:17:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4465, 'learning_rate': 3.6159e-06, 'epoch': 2.48, 'throughput': 780.28}

06/17/2024 21:18:13 - INFO - llamafactory.extras.callbacks - {'loss': 0.3250, 'learning_rate': 3.0923e-06, 'epoch': 2.52, 'throughput': 779.93}

06/17/2024 21:19:04 - INFO - llamafactory.extras.callbacks - {'loss': 0.3920, 'learning_rate': 2.6072e-06, 'epoch': 2.56, 'throughput': 779.59}

06/17/2024 21:19:53 - INFO - llamafactory.extras.callbacks - {'loss': 0.3672, 'learning_rate': 2.1614e-06, 'epoch': 2.60, 'throughput': 779.34}

06/17/2024 21:20:43 - INFO - llamafactory.extras.callbacks - {'loss': 0.3554, 'learning_rate': 1.7556e-06, 'epoch': 2.64, 'throughput': 779.07}

06/17/2024 21:21:27 - INFO - llamafactory.extras.callbacks - {'loss': 0.3801, 'learning_rate': 1.3906e-06, 'epoch': 2.68, 'throughput': 778.76}

06/17/2024 21:22:20 - INFO - llamafactory.extras.callbacks - {'loss': 0.4350, 'learning_rate': 1.0670e-06, 'epoch': 2.72, 'throughput': 779.56}

06/17/2024 21:23:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.4063, 'learning_rate': 7.8542e-07, 'epoch': 2.76, 'throughput': 779.43}

06/17/2024 21:24:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.4894, 'learning_rate': 5.4631e-07, 'epoch': 2.80, 'throughput': 779.76}

06/17/2024 21:25:05 - INFO - llamafactory.extras.callbacks - {'loss': 0.3822, 'learning_rate': 3.5010e-07, 'epoch': 2.84, 'throughput': 779.50}

06/17/2024 21:25:57 - INFO - llamafactory.extras.callbacks - {'loss': 0.4028, 'learning_rate': 1.9713e-07, 'epoch': 2.88, 'throughput': 779.46}

06/17/2024 21:26:50 - INFO - llamafactory.extras.callbacks - {'loss': 0.4293, 'learning_rate': 8.7679e-08, 'epoch': 2.92, 'throughput': 779.71}

06/17/2024 21:27:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.4280, 'learning_rate': 2.1929e-08, 'epoch': 2.96, 'throughput': 779.90}

06/17/2024 21:28:29 - INFO - llamafactory.extras.callbacks - {'loss': 0.4766, 'learning_rate': 0.0000e+00, 'epoch': 3.00, 'throughput': 779.83}

06/17/2024 21:28:29 - INFO - transformers.trainer - 

Training completed. Do not forget to share your model on huggingface.co/models =)


06/17/2024 21:28:29 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05

06/17/2024 21:28:30 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json

06/17/2024 21:28:30 - INFO - transformers.configuration_utils - Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


06/17/2024 21:28:30 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/tokenizer_config.json

06/17/2024 21:28:30 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/train_2024-06-17-19-49-05/special_tokens_map.json

06/17/2024 21:28:30 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.

06/17/2024 21:28:30 - INFO - transformers.modelcard - Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}