[2023-12-10 15:26:50,373] torch.distributed.run: [WARNING] master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. [2023-12-10 15:26:50,373] torch.distributed.run: [WARNING] [2023-12-10 15:26:50,373] torch.distributed.run: [WARNING] ***************************************** [2023-12-10 15:26:50,373] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2023-12-10 15:26:50,373] torch.distributed.run: [WARNING] ***************************************** 12/10/2023 15:26:54 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 12/10/2023 15:26:54 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=output/text-20231210-152648-1e-4/runs/Dec10_15-26-53_lily-gpu07, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=500, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, output_dir=output/text-20231210-152648-1e-4, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=1, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=output/text-20231210-152648-1e-4, save_on_each_node=False, save_safetensors=False, save_steps=50, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) [INFO|tokenization_utils_base.py:2043] 2023-12-10 15:26:55,236 >> loading file tokenizer.model from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/tokenizer.model [INFO|tokenization_utils_base.py:2043] 2023-12-10 15:26:55,236 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2043] 2023-12-10 15:26:55,236 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:2043] 2023-12-10 15:26:55,236 >> loading file tokenizer_config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/tokenizer_config.json [INFO|tokenization_utils_base.py:2043] 2023-12-10 15:26:55,236 >> loading file tokenizer.json from cache at None [INFO|configuration_utils.py:715] 2023-12-10 15:26:55,589 >> loading configuration file config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/config.json [INFO|configuration_utils.py:715] 2023-12-10 15:26:55,852 >> loading configuration file config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/config.json [INFO|configuration_utils.py:775] 2023-12-10 15:26:55,853 >> Model config ChatGLMConfig { "_name_or_path": "THUDM/chatglm3-6b-base", "add_bias_linear": false, "add_qkv_bias": true, "apply_query_key_layer_scaling": true, "apply_residual_connection_post_layernorm": false, "architectures": [ "ChatGLMModel" ], "attention_dropout": 0.0, "attention_softmax_in_fp32": true, "auto_map": { "AutoConfig": "THUDM/chatglm3-6b-base--configuration_chatglm.ChatGLMConfig", "AutoModel": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForCausalLM": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSequenceClassification": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForSequenceClassification" }, "bias_dropout_fusion": true, "classifier_dropout": null, "eos_token_id": 2, "ffn_hidden_size": 13696, "fp32_residual_connection": false, "hidden_dropout": 0.0, "hidden_size": 4096, "kv_channels": 128, "layernorm_epsilon": 1e-05, "model_type": "chatglm", "multi_query_attention": true, "multi_query_group_num": 2, "num_attention_heads": 32, "num_layers": 28, "original_rope": true, "pad_token_id": 0, "padded_vocab_size": 65024, "post_layer_norm": true, "pre_seq_len": null, "prefix_projection": false, "quantization_bit": 0, "rmsnorm": true, "seq_length": 32768, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.34.0", "use_cache": true, "vocab_size": 65024 } 12/10/2023 15:26:55 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False [INFO|modeling_utils.py:2993] 2023-12-10 15:26:56,183 >> loading weights file pytorch_model.bin from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/pytorch_model.bin.index.json [INFO|configuration_utils.py:770] 2023-12-10 15:26:56,185 >> Generate config GenerationConfig { "eos_token_id": 2, "pad_token_id": 0 } Loading checkpoint shards: 0%| | 0/7 [00:00> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration. [INFO|modeling_utils.py:3783] 2023-12-10 15:27:09,564 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at THUDM/chatglm3-6b-base. If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training. [INFO|modeling_utils.py:3352] 2023-12-10 15:27:09,818 >> Generation config file not found, using a generation config created from the model config. Loading checkpoint shards: 100%|██████████| 7/7 [00:12<00:00, 1.64s/it] Loading checkpoint shards: 100%|██████████| 7/7 [00:12<00:00, 1.81s/it] Train dataset size: 52002 Sanity Check >>>>>>>>>>>>> '[gMASK]': 64790 -> -100 'sop': 64792 -> -100 'Instruction': 29101 -> -100 ':': 30954 -> -100 'Give': 10465 -> -100 'three': 1194 -> -100 'tips': 6639 -> -100 'for': 332 -> -100 'staying': 10061 -> -100 'healthy': 4651 -> -100 '.': 30930 -> -100 '\n': 13 -> -100 'An': 4244 -> -100 'sw': 1902 -> -100 'er': 266 -> -100 ':': 30954 -> -100 '': 30910 -> -100 '': 30910 -> 30910 '1': 30939 -> 30939 '.': 30930 -> 30930 'E': 30950 -> 30950 'at': 269 -> 269 'a': 260 -> 260 'balanced': 12949 -> 12949 'diet': 5546 -> 5546 'and': 293 -> 293 'make': 794 -> 794 'sure': 1506 -> 1506 'to': 289 -> 289 'include': 1860 -> 1860 'plenty': 5765 -> 5765 'of': 290 -> 290 'fruits': 13665 -> 13665 'and': 293 -> 293 'vegetables': 11567 -> 11567 '.': 30930 -> 30930 '': 30910 -> 30910 '\n': 13 -> 13 '2': 30943 -> 30943 '.': 30930 -> 30930 'Exercise': 23340 -> 23340 'regularly': 7414 -> 7414 'to': 289 -> 289 'keep': 1407 -> 1407 'your': 475 -> 475 'body': 1934 -> 1934 'active': 4047 -> 4047 'and': 293 -> 293 'strong': 2034 -> 2034 '.': 30930 -> 30930 '': 30910 -> 30910 '\n': 13 -> 13 '3': 30966 -> 30966 '.': 30930 -> 30930 'Get': 3286 -> 3286 'enough': 1775 -> 1775 'sleep': 4039 -> 4039 'and': 293 -> 293 'maintain': 3165 -> 3165 'a': 260 -> 260 'consistent': 7096 -> 7096 'sleep': 4039 -> 4039 'schedule': 5821 -> 5821 '.': 30930 -> 30930 '': 2 -> 2 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 <<<<<<<<<<<<< Sanity Check Train dataset size: 52002 Sanity Check >>>>>>>>>>>>> '[gMASK]': 64790 -> -100 'sop': 64792 -> -100 'Instruction': 29101 -> -100 ':': 30954 -> -100 'Give': 10465 -> -100 'three': 1194 -> -100 'tips': 6639 -> -100 'for': 332 -> -100 'staying': 10061 -> -100 'healthy': 4651 -> -100 '.': 30930 -> -100 '\n': 13 -> -100 'An': 4244 -> -100 'sw': 1902 -> -100 'er': 266 -> -100 ':': 30954 -> -100 '': 30910 -> -100 '': 30910 -> 30910 '1': 30939 -> 30939 '.': 30930 -> 30930 'E': 30950 -> 30950 'at': 269 -> 269 'a': 260 -> 260 'balanced': 12949 -> 12949 'diet': 5546 -> 5546 'and': 293 -> 293 'make': 794 -> 794 'sure': 1506 -> 1506 'to': 289 -> 289 'include': 1860 -> 1860 'plenty': 5765 -> 5765 'of': 290 -> 290 'fruits': 13665 -> 13665 'and': 293 -> 293 'vegetables': 11567 -> 11567 '.': 30930 -> 30930 '': 30910 -> 30910 '\n': 13 -> 13 '2': 30943 -> 30943 '.': 30930 -> 30930 'Exercise': 23340 -> 23340 'regularly': 7414 -> 7414 'to': 289 -> 289 'keep': 1407 -> 1407 'your': 475 -> 475 'body': 1934 -> 1934 'active': 4047 -> 4047 'and': 293 -> 293 'strong': 2034 -> 2034 '.': 30930 -> 30930 '': 30910 -> 30910 '\n': 13 -> 13 '3': 30966 -> 30966 '.': 30930 -> 30930 'Get': 3286 -> 3286 'enough': 1775 -> 1775 'sleep': 4039 -> 4039 'and': 293 -> 293 'maintain': 3165 -> 3165 'a': 260 -> 260 'consistent': 7096 -> 7096 'sleep': 4039 -> 4039 'schedule': 5821 -> 5821 '.': 30930 -> 30930 '': 2 -> 2 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 '': 0 -> -100 <<<<<<<<<<<<< Sanity Check [INFO|trainer.py:576] 2023-12-10 15:27:18,453 >> max_steps is given, it will override any value given in num_train_epochs [INFO|trainer.py:1760] 2023-12-10 15:27:20,364 >> ***** Running training ***** [INFO|trainer.py:1761] 2023-12-10 15:27:20,364 >> Num examples = 52,002 [INFO|trainer.py:1762] 2023-12-10 15:27:20,364 >> Num Epochs = 1 [INFO|trainer.py:1763] 2023-12-10 15:27:20,364 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1766] 2023-12-10 15:27:20,364 >> Total train batch size (w. parallel, distributed & accumulation) = 4 [INFO|trainer.py:1767] 2023-12-10 15:27:20,365 >> Gradient Accumulation steps = 2 [INFO|trainer.py:1768] 2023-12-10 15:27:20,365 >> Total optimization steps = 500 [INFO|trainer.py:1769] 2023-12-10 15:27:20,366 >> Number of trainable parameters = 1,949,696 0%| | 0/500 [00:00> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-50/tokenizer_config.json [INFO|tokenization_utils_base.py:2446] 2023-12-10 15:28:26,862 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-50/special_tokens_map.json 10%|█ | 51/500 [01:06<11:00, 1.47s/it] {'loss': 0.0, 'learning_rate': 8.98e-05, 'epoch': 0.0} 10%|█ | 51/500 [01:06<11:00, 1.47s/it] 10%|█ | 52/500 [01:07<11:01, 1.48s/it] {'loss': 0.0, 'learning_rate': 8.960000000000001e-05, 'epoch': 0.0} 10%|█ | 52/500 [01:07<11:01, 1.48s/it] 11%|█ | 53/500 [01:09<10:57, 1.47s/it] {'loss': 0.0, 'learning_rate': 8.94e-05, 'epoch': 0.0} 11%|█ | 53/500 [01:09<10:57, 1.47s/it] 11%|█ | 54/500 [01:10<10:57, 1.47s/it] {'loss': 0.0, 'learning_rate': 8.92e-05, 'epoch': 0.0} 11%|█ | 54/500 [01:10<10:57, 1.47s/it] 11%|█ | 55/500 [01:12<10:52, 1.47s/it] {'loss': 0.0, 'learning_rate': 8.900000000000001e-05, 'epoch': 0.0} 11%|█ | 55/500 [01:12<10:52, 1.47s/it] 11%|█ | 56/500 [01:13<10:48, 1.46s/it] {'loss': 0.0, 'learning_rate': 8.88e-05, 'epoch': 0.0} 11%|█ | 56/500 [01:13<10:48, 1.46s/it] 11%|█▏ | 57/500 [01:14<10:42, 1.45s/it] {'loss': 0.0, 'learning_rate': 8.86e-05, 'epoch': 0.0} 11%|█▏ | 57/500 [01:14<10:42, 1.45s/it] 12%|█▏ | 58/500 [01:16<10:20, 1.40s/it] {'loss': 0.0, 'learning_rate': 8.840000000000001e-05, 'epoch': 0.0} 12%|█▏ | 58/500 [01:16<10:20, 1.40s/it] 12%|█▏ | 59/500 [01:17<09:11, 1.25s/it] {'loss': 0.0, 'learning_rate': 8.82e-05, 'epoch': 0.0} 12%|█▏ | 59/500 [01:17<09:11, 1.25s/it] 12%|█▏ | 60/500 [01:17<08:25, 1.15s/it] {'loss': 0.0, 'learning_rate': 8.800000000000001e-05, 'epoch': 0.0} 12%|█▏ | 60/500 [01:17<08:25, 1.15s/it] 12%|█▏ | 61/500 [01:18<07:51, 1.07s/it] {'loss': 0.0, 'learning_rate': 8.78e-05, 'epoch': 0.0} 12%|█▏ | 61/500 [01:18<07:51, 1.07s/it] 12%|█▏ | 62/500 [01:19<07:29, 1.03s/it] {'loss': 0.0, 'learning_rate': 8.76e-05, 'epoch': 0.0} 12%|█▏ | 62/500 [01:19<07:29, 1.03s/it] 13%|█▎ | 63/500 [01:20<07:11, 1.01it/s] {'loss': 0.0, 'learning_rate': 8.740000000000001e-05, 'epoch': 0.0} 13%|█▎ | 63/500 [01:20<07:11, 1.01it/s] 13%|█▎ | 64/500 [01:21<06:57, 1.05it/s] {'loss': 0.0, 'learning_rate': 8.72e-05, 'epoch': 0.0} 13%|█▎ | 64/500 [01:21<06:57, 1.05it/s] 13%|█▎ | 65/500 [01:22<06:52, 1.05it/s] {'loss': 0.0, 'learning_rate': 8.7e-05, 'epoch': 0.0} 13%|█▎ | 65/500 [01:22<06:52, 1.05it/s] 13%|█▎ | 66/500 [01:23<07:08, 1.01it/s] {'loss': 0.0, 'learning_rate': 8.680000000000001e-05, 'epoch': 0.01} 13%|█▎ | 66/500 [01:23<07:08, 1.01it/s] 13%|█▎ | 67/500 [01:24<07:43, 1.07s/it] {'loss': 0.0, 'learning_rate': 8.66e-05, 'epoch': 0.01} 13%|█▎ | 67/500 [01:24<07:43, 1.07s/it] 14%|█▎ | 68/500 [01:26<08:16, 1.15s/it] {'loss': 0.0, 'learning_rate': 8.64e-05, 'epoch': 0.01} 14%|█▎ | 68/500 [01:26<08:16, 1.15s/it] 14%|█▍ | 69/500 [01:27<08:38, 1.20s/it] {'loss': 0.0, 'learning_rate': 8.620000000000001e-05, 'epoch': 0.01} 14%|█▍ | 69/500 [01:27<08:38, 1.20s/it] 14%|█▍ | 70/500 [01:28<08:32, 1.19s/it] {'loss': 0.0, 'learning_rate': 8.6e-05, 'epoch': 0.01} 14%|█▍ | 70/500 [01:28<08:32, 1.19s/it] 14%|█▍ | 71/500 [01:29<07:31, 1.05s/it] {'loss': 0.0, 'learning_rate': 8.58e-05, 'epoch': 0.01} 14%|█▍ | 71/500 [01:29<07:31, 1.05s/it] 14%|█▍ | 72/500 [01:29<06:12, 1.15it/s] {'loss': 0.0, 'learning_rate': 8.560000000000001e-05, 'epoch': 0.01} 14%|█▍ | 72/500 [01:29<06:12, 1.15it/s] 15%|█▍ | 73/500 [01:30<05:17, 1.34it/s] {'loss': 0.0, 'learning_rate': 8.54e-05, 'epoch': 0.01} 15%|█▍ | 73/500 [01:30<05:17, 1.34it/s] 15%|█▍ | 74/500 [01:30<04:39, 1.53it/s] {'loss': 0.0, 'learning_rate': 8.52e-05, 'epoch': 0.01} 15%|█▍ | 74/500 [01:30<04:39, 1.53it/s] 15%|█▌ | 75/500 [01:31<04:12, 1.68it/s] {'loss': 0.0, 'learning_rate': 8.5e-05, 'epoch': 0.01} 15%|█▌ | 75/500 [01:31<04:12, 1.68it/s] 15%|█▌ | 76/500 [01:31<03:53, 1.82it/s] {'loss': 0.0, 'learning_rate': 8.48e-05, 'epoch': 0.01} 15%|█▌ | 76/500 [01:31<03:53, 1.82it/s] 15%|█▌ | 77/500 [01:32<03:40, 1.92it/s] {'loss': 0.0, 'learning_rate': 8.46e-05, 'epoch': 0.01} 15%|█▌ | 77/500 [01:32<03:40, 1.92it/s] 16%|█▌ | 78/500 [01:32<03:30, 2.00it/s] {'loss': 0.0, 'learning_rate': 8.44e-05, 'epoch': 0.01} 16%|█▌ | 78/500 [01:32<03:30, 2.00it/s] 16%|█▌ | 79/500 [01:33<03:23, 2.07it/s] {'loss': 0.0, 'learning_rate': 8.42e-05, 'epoch': 0.01} 16%|█▌ | 79/500 [01:33<03:23, 2.07it/s] 16%|█▌ | 80/500 [01:33<03:20, 2.10it/s] {'loss': 0.0, 'learning_rate': 8.4e-05, 'epoch': 0.01} 16%|█▌ | 80/500 [01:33<03:20, 2.10it/s] 16%|█▌ | 81/500 [01:33<03:17, 2.12it/s] {'loss': 0.0, 'learning_rate': 8.38e-05, 'epoch': 0.01} 16%|█▌ | 81/500 [01:33<03:17, 2.12it/s] 16%|█▋ | 82/500 [01:34<03:15, 2.14it/s] {'loss': 0.0, 'learning_rate': 8.36e-05, 'epoch': 0.01} 16%|█▋ | 82/500 [01:34<03:15, 2.14it/s] 17%|█▋ | 83/500 [01:34<03:14, 2.15it/s] {'loss': 0.0, 'learning_rate': 8.34e-05, 'epoch': 0.01} 17%|█▋ | 83/500 [01:34<03:14, 2.15it/s] 17%|█▋ | 84/500 [01:35<03:13, 2.15it/s] {'loss': 0.0, 'learning_rate': 8.32e-05, 'epoch': 0.01} 17%|█▋ | 84/500 [01:35<03:13, 2.15it/s] 17%|█▋ | 85/500 [01:35<03:12, 2.16it/s] {'loss': 0.0, 'learning_rate': 8.3e-05, 'epoch': 0.01} 17%|█▋ | 85/500 [01:35<03:12, 2.16it/s] 17%|█▋ | 86/500 [01:36<03:12, 2.16it/s] {'loss': 0.0, 'learning_rate': 8.28e-05, 'epoch': 0.01} 17%|█▋ | 86/500 [01:36<03:12, 2.16it/s] 17%|█▋ | 87/500 [01:36<03:10, 2.16it/s] {'loss': 0.0, 'learning_rate': 8.26e-05, 'epoch': 0.01} 17%|█▋ | 87/500 [01:36<03:10, 2.16it/s] 18%|█▊ | 88/500 [01:37<03:10, 2.16it/s] {'loss': 0.0, 'learning_rate': 8.24e-05, 'epoch': 0.01} 18%|█▊ | 88/500 [01:37<03:10, 2.16it/s] 18%|█▊ | 89/500 [01:37<03:09, 2.17it/s] {'loss': 0.0, 'learning_rate': 8.22e-05, 'epoch': 0.01} 18%|█▊ | 89/500 [01:37<03:09, 2.17it/s] 18%|█▊ | 90/500 [01:38<03:09, 2.17it/s] {'loss': 0.0, 'learning_rate': 8.2e-05, 'epoch': 0.01} 18%|█▊ | 90/500 [01:38<03:09, 2.17it/s] 18%|█▊ | 91/500 [01:38<03:13, 2.11it/s] {'loss': 0.0, 'learning_rate': 8.18e-05, 'epoch': 0.01} 18%|█▊ | 91/500 [01:38<03:13, 2.11it/s] 18%|█▊ | 92/500 [01:39<03:14, 2.10it/s] {'loss': 0.0, 'learning_rate': 8.16e-05, 'epoch': 0.01} 18%|█▊ | 92/500 [01:39<03:14, 2.10it/s] 19%|█▊ | 93/500 [01:39<03:18, 2.05it/s] {'loss': 0.0, 'learning_rate': 8.14e-05, 'epoch': 0.01} 19%|█▊ | 93/500 [01:39<03:18, 2.05it/s] 19%|█▉ | 94/500 [01:40<03:16, 2.07it/s] {'loss': 0.0, 'learning_rate': 8.120000000000001e-05, 'epoch': 0.01} 19%|█▉ | 94/500 [01:40<03:16, 2.07it/s] 19%|█▉ | 95/500 [01:40<03:12, 2.10it/s] {'loss': 0.0, 'learning_rate': 8.1e-05, 'epoch': 0.01} 19%|█▉ | 95/500 [01:40<03:12, 2.10it/s] 19%|█▉ | 96/500 [01:41<03:19, 2.02it/s] {'loss': 0.0, 'learning_rate': 8.080000000000001e-05, 'epoch': 0.01} 19%|█▉ | 96/500 [01:41<03:19, 2.02it/s] 19%|█▉ | 97/500 [01:42<04:17, 1.57it/s] {'loss': 0.0, 'learning_rate': 8.060000000000001e-05, 'epoch': 0.01} 19%|█▉ | 97/500 [01:42<04:17, 1.57it/s] 20%|█▉ | 98/500 [01:43<05:47, 1.16it/s] {'loss': 0.0, 'learning_rate': 8.04e-05, 'epoch': 0.01} 20%|█▉ | 98/500 [01:43<05:47, 1.16it/s] 20%|█▉ | 99/500 [01:44<06:58, 1.04s/it] {'loss': 0.0, 'learning_rate': 8.020000000000001e-05, 'epoch': 0.01} 20%|█▉ | 99/500 [01:44<06:58, 1.04s/it] 20%|██ | 100/500 [01:46<07:36, 1.14s/it] {'loss': 0.0, 'learning_rate': 8e-05, 'epoch': 0.01} 20%|██ | 100/500 [01:46<07:36, 1.14s/it][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:29:08,477 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-100/tokenizer_config.json [INFO|tokenization_utils_base.py:2446] 2023-12-10 15:29:08,478 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-100/special_tokens_map.json 20%|██ | 101/500 [01:47<07:13, 1.09s/it] {'loss': 0.0, 'learning_rate': 7.98e-05, 'epoch': 0.01} 20%|██ | 101/500 [01:47<07:13, 1.09s/it] 20%|██ | 102/500 [01:48<06:56, 1.05s/it] {'loss': 0.0, 'learning_rate': 7.960000000000001e-05, 'epoch': 0.01} 20%|██ | 102/500 [01:48<06:56, 1.05s/it] 21%|██ | 103/500 [01:49<06:42, 1.01s/it] {'loss': 0.0, 'learning_rate': 7.94e-05, 'epoch': 0.01} 21%|██ | 103/500 [01:49<06:42, 1.01s/it] 21%|██ | 104/500 [01:50<06:31, 1.01it/s] {'loss': 0.0, 'learning_rate': 7.920000000000001e-05, 'epoch': 0.01} 21%|██ | 104/500 [01:50<06:31, 1.01it/s] 21%|██ | 105/500 [01:50<06:20, 1.04it/s] {'loss': 0.0, 'learning_rate': 7.900000000000001e-05, 'epoch': 0.01} 21%|██ | 105/500 [01:50<06:20, 1.04it/s] 21%|██ | 106/500 [01:51<05:52, 1.12it/s] {'loss': 0.0, 'learning_rate': 7.88e-05, 'epoch': 0.01} 21%|██ | 106/500 [01:51<05:52, 1.12it/s] 21%|██▏ | 107/500 [01:52<05:32, 1.18it/s] {'loss': 0.0, 'learning_rate': 7.860000000000001e-05, 'epoch': 0.01} 21%|██▏ | 107/500 [01:52<05:32, 1.18it/s] 22%|██▏ | 108/500 [01:53<05:36, 1.16it/s] {'loss': 0.0, 'learning_rate': 7.840000000000001e-05, 'epoch': 0.01} 22%|██▏ | 108/500 [01:53<05:36, 1.16it/s] 22%|██▏ | 109/500 [01:54<05:25, 1.20it/s] {'loss': 0.0, 'learning_rate': 7.82e-05, 'epoch': 0.01} 22%|██▏ | 109/500 [01:54<05:25, 1.20it/s] 22%|██▏ | 110/500 [01:54<05:10, 1.26it/s] {'loss': 0.0, 'learning_rate': 7.800000000000001e-05, 'epoch': 0.01} 22%|██▏ | 110/500 [01:54<05:10, 1.26it/s] 22%|██▏ | 111/500 [01:55<05:12, 1.25it/s] {'loss': 0.0, 'learning_rate': 7.780000000000001e-05, 'epoch': 0.01} 22%|██▏ | 111/500 [01:55<05:12, 1.25it/s] 22%|██▏ | 112/500 [01:56<05:02, 1.28it/s] {'loss': 0.0, 'learning_rate': 7.76e-05, 'epoch': 0.01} 22%|██▏ | 112/500 [01:56<05:02, 1.28it/s] 23%|██▎ | 113/500 [01:57<04:59, 1.29it/s] {'loss': 0.0, 'learning_rate': 7.740000000000001e-05, 'epoch': 0.01} 23%|██▎ | 113/500 [01:57<04:59, 1.29it/s] 23%|██▎ | 114/500 [01:57<04:59, 1.29it/s] {'loss': 0.0, 'learning_rate': 7.72e-05, 'epoch': 0.01} 23%|██▎ | 114/500 [01:57<04:59, 1.29it/s] 23%|██▎ | 115/500 [01:58<05:12, 1.23it/s] {'loss': 0.0, 'learning_rate': 7.7e-05, 'epoch': 0.01} 23%|██▎ | 115/500 [01:58<05:12, 1.23it/s] 23%|██▎ | 116/500 [01:59<05:44, 1.11it/s] {'loss': 0.0, 'learning_rate': 7.680000000000001e-05, 'epoch': 0.01} 23%|██▎ | 116/500 [01:59<05:44, 1.11it/s] 23%|██▎ | 117/500 [02:01<06:36, 1.03s/it] {'loss': 0.0, 'learning_rate': 7.66e-05, 'epoch': 0.01} 23%|██▎ | 117/500 [02:01<06:36, 1.03s/it] 24%|██▎ | 118/500 [02:02<07:16, 1.14s/it] {'loss': 0.0, 'learning_rate': 7.64e-05, 'epoch': 0.01} 24%|██▎ | 118/500 [02:02<07:16, 1.14s/it] 24%|██▍ | 119/500 [02:03<07:46, 1.22s/it] {'loss': 0.0, 'learning_rate': 7.620000000000001e-05, 'epoch': 0.01} 24%|██▍ | 119/500 [02:03<07:46, 1.22s/it] 24%|██▍ | 120/500 [02:05<08:08, 1.29s/it] {'loss': 0.0, 'learning_rate': 7.6e-05, 'epoch': 0.01} 24%|██▍ | 120/500 [02:05<08:08, 1.29s/it] 24%|██▍ | 121/500 [02:06<08:16, 1.31s/it] {'loss': 0.0, 'learning_rate': 7.58e-05, 'epoch': 0.01} 24%|██▍ | 121/500 [02:06<08:16, 1.31s/it] 24%|██▍ | 122/500 [02:08<08:28, 1.35s/it] {'loss': 0.0, 'learning_rate': 7.560000000000001e-05, 'epoch': 0.01} 24%|██▍ | 122/500 [02:08<08:28, 1.35s/it] 25%|██▍ | 123/500 [02:09<08:34, 1.36s/it] {'loss': 0.0, 'learning_rate': 7.54e-05, 'epoch': 0.01} 25%|██▍ | 123/500 [02:09<08:34, 1.36s/it] 25%|██▍ | 124/500 [02:11<08:42, 1.39s/it] {'loss': 0.0, 'learning_rate': 7.52e-05, 'epoch': 0.01} 25%|██▍ | 124/500 [02:11<08:42, 1.39s/it] 25%|██▌ | 125/500 [02:12<08:43, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.500000000000001e-05, 'epoch': 0.01} 25%|██▌ | 125/500 [02:12<08:43, 1.40s/it] 25%|██▌ | 126/500 [02:13<08:43, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.48e-05, 'epoch': 0.01} 25%|██▌ | 126/500 [02:13<08:43, 1.40s/it] 25%|██▌ | 127/500 [02:15<08:47, 1.42s/it] {'loss': 0.0, 'learning_rate': 7.46e-05, 'epoch': 0.01} 25%|██▌ | 127/500 [02:15<08:47, 1.42s/it] 26%|██▌ | 128/500 [02:16<08:46, 1.42s/it] {'loss': 0.0, 'learning_rate': 7.44e-05, 'epoch': 0.01} 26%|██▌ | 128/500 [02:16<08:46, 1.42s/it] 26%|██▌ | 129/500 [02:18<08:45, 1.42s/it] {'loss': 0.0, 'learning_rate': 7.42e-05, 'epoch': 0.01} 26%|██▌ | 129/500 [02:18<08:45, 1.42s/it] 26%|██▌ | 130/500 [02:19<08:42, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.4e-05, 'epoch': 0.01} 26%|██▌ | 130/500 [02:19<08:42, 1.41s/it] 26%|██▌ | 131/500 [02:20<08:38, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.38e-05, 'epoch': 0.01} 26%|██▌ | 131/500 [02:20<08:38, 1.41s/it] 26%|██▋ | 132/500 [02:22<08:35, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.36e-05, 'epoch': 0.01} 26%|██▋ | 132/500 [02:22<08:35, 1.40s/it] 27%|██▋ | 133/500 [02:23<08:37, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.340000000000001e-05, 'epoch': 0.01} 27%|██▋ | 133/500 [02:23<08:37, 1.41s/it] 27%|██▋ | 134/500 [02:25<08:30, 1.39s/it] {'loss': 0.0, 'learning_rate': 7.32e-05, 'epoch': 0.01} 27%|██▋ | 134/500 [02:25<08:30, 1.39s/it] 27%|██▋ | 135/500 [02:26<08:32, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.3e-05, 'epoch': 0.01} 27%|██▋ | 135/500 [02:26<08:32, 1.40s/it] 27%|██▋ | 136/500 [02:28<08:35, 1.42s/it] {'loss': 0.0, 'learning_rate': 7.280000000000001e-05, 'epoch': 0.01} 27%|██▋ | 136/500 [02:28<08:35, 1.42s/it] 27%|██▋ | 137/500 [02:29<08:33, 1.42s/it] {'loss': 0.0, 'learning_rate': 7.26e-05, 'epoch': 0.01} 27%|██▋ | 137/500 [02:29<08:33, 1.42s/it] 28%|██▊ | 138/500 [02:30<08:29, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.24e-05, 'epoch': 0.01} 28%|██▊ | 138/500 [02:30<08:29, 1.41s/it] 28%|██▊ | 139/500 [02:32<08:22, 1.39s/it] {'loss': 0.0, 'learning_rate': 7.22e-05, 'epoch': 0.01} 28%|██▊ | 139/500 [02:32<08:22, 1.39s/it] 28%|██▊ | 140/500 [02:33<08:21, 1.39s/it] {'loss': 0.0, 'learning_rate': 7.2e-05, 'epoch': 0.01} 28%|██▊ | 140/500 [02:33<08:21, 1.39s/it] 28%|██▊ | 141/500 [02:34<08:20, 1.39s/it] {'loss': 0.0, 'learning_rate': 7.18e-05, 'epoch': 0.01} 28%|██▊ | 141/500 [02:34<08:20, 1.39s/it] 28%|██▊ | 142/500 [02:36<08:21, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.16e-05, 'epoch': 0.01} 28%|██▊ | 142/500 [02:36<08:21, 1.40s/it] 29%|██▊ | 143/500 [02:37<08:20, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.14e-05, 'epoch': 0.01} 29%|██▊ | 143/500 [02:37<08:20, 1.40s/it] 29%|██▉ | 144/500 [02:39<08:20, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.12e-05, 'epoch': 0.01} 29%|██▉ | 144/500 [02:39<08:20, 1.41s/it] 29%|██▉ | 145/500 [02:40<08:17, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.1e-05, 'epoch': 0.01} 29%|██▉ | 145/500 [02:40<08:17, 1.40s/it] 29%|██▉ | 146/500 [02:41<08:15, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.08e-05, 'epoch': 0.01} 29%|██▉ | 146/500 [02:42<08:15, 1.40s/it] 29%|██▉ | 147/500 [02:43<08:15, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.06e-05, 'epoch': 0.01} 29%|██▉ | 147/500 [02:43<08:15, 1.40s/it] 30%|██▉ | 148/500 [02:44<08:12, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.04e-05, 'epoch': 0.01} 30%|██▉ | 148/500 [02:44<08:12, 1.40s/it] 30%|██▉ | 149/500 [02:46<08:16, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.02e-05, 'epoch': 0.01} 30%|██▉ | 149/500 [02:46<08:16, 1.41s/it] 30%|███ | 150/500 [02:47<08:08, 1.40s/it] {'loss': 0.0, 'learning_rate': 7e-05, 'epoch': 0.01} 30%|███ | 150/500 [02:47<08:08, 1.40s/it][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:30:09,868 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-150/tokenizer_config.json [INFO|tokenization_utils_base.py:2446] 2023-12-10 15:30:09,869 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-150/special_tokens_map.json 30%|███ | 151/500 [02:49<08:12, 1.41s/it] {'loss': 0.0, 'learning_rate': 6.98e-05, 'epoch': 0.01} 30%|███ | 151/500 [02:49<08:12, 1.41s/it] 30%|███ | 152/500 [02:50<08:13, 1.42s/it] {'loss': 0.0, 'learning_rate': 6.96e-05, 'epoch': 0.01} 30%|███ | 152/500 [02:50<08:13, 1.42s/it] 31%|███ | 153/500 [02:51<08:08, 1.41s/it] {'loss': 0.0, 'learning_rate': 6.939999999999999e-05, 'epoch': 0.01} 31%|███ | 153/500 [02:51<08:08, 1.41s/it] 31%|███ | 154/500 [02:53<08:01, 1.39s/it] {'loss': 0.0, 'learning_rate': 6.92e-05, 'epoch': 0.01} 31%|███ | 154/500 [02:53<08:01, 1.39s/it] 31%|███ | 155/500 [02:54<07:59, 1.39s/it] {'loss': 0.0, 'learning_rate': 6.9e-05, 'epoch': 0.01} 31%|███ | 155/500 [02:54<07:59, 1.39s/it] 31%|███ | 156/500 [02:56<08:01, 1.40s/it] {'loss': 0.0, 'learning_rate': 6.879999999999999e-05, 'epoch': 0.01} 31%|███ | 156/500 [02:56<08:01, 1.40s/it] 31%|███▏ | 157/500 [02:57<08:01, 1.40s/it] {'loss': 0.0, 'learning_rate': 6.860000000000001e-05, 'epoch': 0.01} 31%|███▏ | 157/500 [02:57<08:01, 1.40s/it] 32%|███▏ | 158/500 [02:58<07:59, 1.40s/it] {'loss': 0.0, 'learning_rate': 6.840000000000001e-05, 'epoch': 0.01} 32%|███▏ | 158/500 [02:58<07:59, 1.40s/it] 32%|███▏ | 159/500 [03:00<08:00, 1.41s/it] {'loss': 0.0, 'learning_rate': 6.82e-05, 'epoch': 0.01} 32%|███▏ | 159/500 [03:00<08:00, 1.41s/it] 32%|███▏ | 160/500 [03:01<07:55, 1.40s/it] {'loss': 0.0, 'learning_rate': 6.800000000000001e-05, 'epoch': 0.01} 32%|███▏ | 160/500 [03:01<07:55, 1.40s/it] 32%|███▏ | 161/500 [03:03<08:00, 1.42s/it] {'loss': 0.0, 'learning_rate': 6.780000000000001e-05, 'epoch': 0.01} 32%|███▏ | 161/500 [03:03<08:00, 1.42s/it] 32%|███▏ | 162/500 [03:04<07:59, 1.42s/it] {'loss': 0.0, 'learning_rate': 6.76e-05, 'epoch': 0.01} 32%|███▏ | 162/500 [03:04<07:59, 1.42s/it] 33%|███▎ | 163/500 [03:05<07:55, 1.41s/it] {'loss': 0.0, 'learning_rate': 6.740000000000001e-05, 'epoch': 0.01} 33%|███▎ | 163/500 [03:05<07:55, 1.41s/it] 33%|███▎ | 164/500 [03:07<07:55, 1.42s/it] {'loss': 0.0, 'learning_rate': 6.720000000000001e-05, 'epoch': 0.01} 33%|███▎ | 164/500 [03:07<07:55, 1.42s/it] 33%|███▎ | 165/500 [03:08<07:58, 1.43s/it] {'loss': 0.0, 'learning_rate': 6.7e-05, 'epoch': 0.01} 33%|███▎ | 165/500 [03:08<07:58, 1.43s/it] 33%|███▎ | 166/500 [03:10<07:56, 1.43s/it] {'loss': 0.0, 'learning_rate': 6.680000000000001e-05, 'epoch': 0.01} 33%|███▎ | 166/500 [03:10<07:56, 1.43s/it] 33%|███▎ | 167/500 [03:11<07:58, 1.44s/it] {'loss': 0.0, 'learning_rate': 6.66e-05, 'epoch': 0.01} 33%|███▎ | 167/500 [03:11<07:58, 1.44s/it] 34%|███▎ | 168/500 [03:13<07:57, 1.44s/it] {'loss': 0.0, 'learning_rate': 6.64e-05, 'epoch': 0.01} 34%|███▎ | 168/500 [03:13<07:57, 1.44s/it] 34%|███▍ | 169/500 [03:14<07:57, 1.44s/it] {'loss': 0.0, 'learning_rate': 6.620000000000001e-05, 'epoch': 0.01} 34%|███▍ | 169/500 [03:14<07:57, 1.44s/it] 34%|███▍ | 170/500 [03:16<08:00, 1.46s/it] {'loss': 0.0, 'learning_rate': 6.6e-05, 'epoch': 0.01} 34%|███▍ | 170/500 [03:16<08:00, 1.46s/it] 34%|███▍ | 171/500 [03:17<07:57, 1.45s/it] {'loss': 0.0, 'learning_rate': 6.58e-05, 'epoch': 0.01} 34%|███▍ | 171/500 [03:17<07:57, 1.45s/it] 34%|███▍ | 172/500 [03:18<07:55, 1.45s/it] {'loss': 0.0, 'learning_rate': 6.560000000000001e-05, 'epoch': 0.01} 34%|███▍ | 172/500 [03:18<07:55, 1.45s/it] 35%|███▍ | 173/500 [03:20<07:57, 1.46s/it] {'loss': 0.0, 'learning_rate': 6.54e-05, 'epoch': 0.01} 35%|███▍ | 173/500 [03:20<07:57, 1.46s/it] 35%|███▍ | 174/500 [03:21<07:48, 1.44s/it] {'loss': 0.0, 'learning_rate': 6.52e-05, 'epoch': 0.01} 35%|███▍ | 174/500 [03:21<07:48, 1.44s/it] 35%|███▌ | 175/500 [03:22<06:59, 1.29s/it] {'loss': 0.0, 'learning_rate': 6.500000000000001e-05, 'epoch': 0.01} 35%|███▌ | 175/500 [03:22<06:59, 1.29s/it] 35%|███▌ | 176/500 [03:23<06:21, 1.18s/it] {'loss': 0.0, 'learning_rate': 6.48e-05, 'epoch': 0.01} 35%|███▌ | 176/500 [03:23<06:21, 1.18s/it] 35%|███▌ | 177/500 [03:24<05:43, 1.06s/it] {'loss': 0.0, 'learning_rate': 6.460000000000001e-05, 'epoch': 0.01} 35%|███▌ | 177/500 [03:24<05:43, 1.06s/it] 36%|███▌ | 178/500 [03:24<04:43, 1.14it/s] {'loss': 0.0, 'learning_rate': 6.440000000000001e-05, 'epoch': 0.01} 36%|███▌ | 178/500 [03:24<04:43, 1.14it/s] 36%|███▌ | 179/500 [03:25<04:00, 1.34it/s] {'loss': 0.0, 'learning_rate': 6.42e-05, 'epoch': 0.01} 36%|███▌ | 179/500 [03:25<04:00, 1.34it/s] 36%|███▌ | 180/500 [03:25<03:30, 1.52it/s] {'loss': 0.0, 'learning_rate': 6.400000000000001e-05, 'epoch': 0.01} 36%|███▌ | 180/500 [03:25<03:30, 1.52it/s] 36%|███▌ | 181/500 [03:26<03:09, 1.68it/s] {'loss': 0.0, 'learning_rate': 6.38e-05, 'epoch': 0.01} 36%|███▌ | 181/500 [03:26<03:09, 1.68it/s] 36%|███▋ | 182/500 [03:26<02:54, 1.82it/s] {'loss': 0.0, 'learning_rate': 6.36e-05, 'epoch': 0.01} 36%|███▋ | 182/500 [03:26<02:54, 1.82it/s] 37%|███▋ | 183/500 [03:27<02:44, 1.92it/s] {'loss': 0.0, 'learning_rate': 6.340000000000001e-05, 'epoch': 0.01} 37%|███▋ | 183/500 [03:27<02:44, 1.92it/s] 37%|███▋ | 184/500 [03:27<02:37, 2.00it/s] {'loss': 0.0, 'learning_rate': 6.32e-05, 'epoch': 0.01} 37%|███▋ | 184/500 [03:27<02:37, 2.00it/s] 37%|███▋ | 185/500 [03:28<02:33, 2.05it/s] {'loss': 0.0, 'learning_rate': 6.3e-05, 'epoch': 0.01} 37%|███▋ | 185/500 [03:28<02:33, 2.05it/s] 37%|███▋ | 186/500 [03:28<02:30, 2.09it/s] {'loss': 0.0, 'learning_rate': 6.280000000000001e-05, 'epoch': 0.01} 37%|███▋ | 186/500 [03:28<02:30, 2.09it/s] 37%|███▋ | 187/500 [03:28<02:27, 2.12it/s] {'loss': 0.0, 'learning_rate': 6.26e-05, 'epoch': 0.01} 37%|███▋ | 187/500 [03:28<02:27, 2.12it/s] 38%|███▊ | 188/500 [03:29<02:25, 2.14it/s] {'loss': 0.0, 'learning_rate': 6.24e-05, 'epoch': 0.01} 38%|███▊ | 188/500 [03:29<02:25, 2.14it/s] 38%|███▊ | 189/500 [03:29<02:24, 2.15it/s] {'loss': 0.0, 'learning_rate': 6.220000000000001e-05, 'epoch': 0.01} 38%|███▊ | 189/500 [03:29<02:24, 2.15it/s] 38%|███▊ | 190/500 [03:30<02:23, 2.16it/s] {'loss': 0.0, 'learning_rate': 6.2e-05, 'epoch': 0.01} 38%|███▊ | 190/500 [03:30<02:23, 2.16it/s] 38%|███▊ | 191/500 [03:30<02:27, 2.10it/s] {'loss': 0.0, 'learning_rate': 6.18e-05, 'epoch': 0.01} 38%|███▊ | 191/500 [03:30<02:27, 2.10it/s] 38%|███▊ | 192/500 [03:31<02:42, 1.90it/s] {'loss': 0.0, 'learning_rate': 6.16e-05, 'epoch': 0.01} 38%|███▊ | 192/500 [03:31<02:42, 1.90it/s] 39%|███▊ | 193/500 [03:32<03:04, 1.66it/s] {'loss': 0.0, 'learning_rate': 6.14e-05, 'epoch': 0.01} 39%|███▊ | 193/500 [03:32<03:04, 1.66it/s] 39%|███▉ | 194/500 [03:33<03:20, 1.52it/s] {'loss': 0.0, 'learning_rate': 6.12e-05, 'epoch': 0.01} 39%|███▉ | 194/500 [03:33<03:20, 1.52it/s] 39%|███▉ | 195/500 [03:34<04:13, 1.21it/s] {'loss': 0.0, 'learning_rate': 6.1e-05, 'epoch': 0.01} 39%|███▉ | 195/500 [03:34<04:13, 1.21it/s] 39%|███▉ | 196/500 [03:35<04:31, 1.12it/s] {'loss': 0.0, 'learning_rate': 6.08e-05, 'epoch': 0.02} 39%|███▉ | 196/500 [03:35<04:31, 1.12it/s] 39%|███▉ | 197/500 [03:36<04:32, 1.11it/s] {'loss': 0.0, 'learning_rate': 6.06e-05, 'epoch': 0.02} 39%|███▉ | 197/500 [03:36<04:32, 1.11it/s] 40%|███▉ | 198/500 [03:36<04:17, 1.17it/s] {'loss': 0.0, 'learning_rate': 6.04e-05, 'epoch': 0.02} 40%|███▉ | 198/500 [03:36<04:17, 1.17it/s] 40%|███▉ | 199/500 [03:37<03:39, 1.37it/s] {'loss': 0.0, 'learning_rate': 6.02e-05, 'epoch': 0.02} 40%|███▉ | 199/500 [03:37<03:39, 1.37it/s] 40%|████ | 200/500 [03:37<03:13, 1.55it/s] {'loss': 0.0, 'learning_rate': 6e-05, 'epoch': 0.02} 40%|████ | 200/500 [03:37<03:13, 1.55it/s][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:31:00,119 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-200/tokenizer_config.json [INFO|tokenization_utils_base.py:2446] 2023-12-10 15:31:00,120 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-200/special_tokens_map.json 40%|████ | 201/500 [03:38<02:58, 1.68it/s] {'loss': 0.0, 'learning_rate': 5.9800000000000003e-05, 'epoch': 0.02} 40%|████ | 201/500 [03:38<02:58, 1.68it/s] 40%|████ | 202/500 [03:38<02:43, 1.82it/s] {'loss': 0.0, 'learning_rate': 5.96e-05, 'epoch': 0.02} 40%|████ | 202/500 [03:38<02:43, 1.82it/s] 41%|████ | 203/500 [03:39<02:33, 1.93it/s] {'loss': 0.0, 'learning_rate': 5.94e-05, 'epoch': 0.02} 41%|████ | 203/500 [03:39<02:33, 1.93it/s] 41%|████ | 204/500 [03:39<02:26, 2.02it/s] {'loss': 0.0, 'learning_rate': 5.92e-05, 'epoch': 0.02} 41%|████ | 204/500 [03:39<02:26, 2.02it/s] 41%|████ | 205/500 [03:40<02:21, 2.08it/s] {'loss': 0.0, 'learning_rate': 5.9e-05, 'epoch': 0.02} 41%|████ | 205/500 [03:40<02:21, 2.08it/s] 41%|████ | 206/500 [03:40<02:18, 2.13it/s] {'loss': 0.0, 'learning_rate': 5.88e-05, 'epoch': 0.02} 41%|████ | 206/500 [03:40<02:18, 2.13it/s] 41%|████▏ | 207/500 [03:41<02:15, 2.16it/s] {'loss': 0.0, 'learning_rate': 5.86e-05, 'epoch': 0.02} 41%|████▏ | 207/500 [03:41<02:15, 2.16it/s] 42%|████▏ | 208/500 [03:41<02:14, 2.17it/s] {'loss': 0.0, 'learning_rate': 5.8399999999999997e-05, 'epoch': 0.02} 42%|████▏ | 208/500 [03:41<02:14, 2.17it/s] 42%|████▏ | 209/500 [03:41<02:13, 2.18it/s] {'loss': 0.0, 'learning_rate': 5.82e-05, 'epoch': 0.02} 42%|████▏ | 209/500 [03:41<02:13, 2.18it/s] 42%|████▏ | 210/500 [03:42<02:13, 2.18it/s] {'loss': 0.0, 'learning_rate': 5.8e-05, 'epoch': 0.02} 42%|████▏ | 210/500 [03:42<02:13, 2.18it/s] 42%|████▏ | 211/500 [03:42<02:12, 2.18it/s] {'loss': 0.0, 'learning_rate': 5.7799999999999995e-05, 'epoch': 0.02} 42%|████▏ | 211/500 [03:42<02:12, 2.18it/s] 42%|████▏ | 212/500 [03:43<02:11, 2.18it/s] {'loss': 0.0, 'learning_rate': 5.76e-05, 'epoch': 0.02} 42%|████▏ | 212/500 [03:43<02:11, 2.18it/s] 43%|████▎ | 213/500 [03:43<02:16, 2.10it/s] {'loss': 0.0, 'learning_rate': 5.74e-05, 'epoch': 0.02} 43%|████▎ | 213/500 [03:43<02:16, 2.10it/s] 43%|████▎ | 214/500 [03:44<02:15, 2.10it/s] {'loss': 0.0, 'learning_rate': 5.72e-05, 'epoch': 0.02} 43%|████▎ | 214/500 [03:44<02:15, 2.10it/s] 43%|████▎ | 215/500 [03:44<02:14, 2.11it/s] {'loss': 0.0, 'learning_rate': 5.6999999999999996e-05, 'epoch': 0.02} 43%|████▎ | 215/500 [03:44<02:14, 2.11it/s] 43%|████▎ | 216/500 [03:45<02:13, 2.12it/s] {'loss': 0.0, 'learning_rate': 5.68e-05, 'epoch': 0.02} 43%|████▎ | 216/500 [03:45<02:13, 2.12it/s] 43%|████▎ | 217/500 [03:45<02:15, 2.09it/s] {'loss': 0.0, 'learning_rate': 5.66e-05, 'epoch': 0.02} 43%|████▎ | 217/500 [03:45<02:15, 2.09it/s] 44%|████▎ | 218/500 [03:46<02:17, 2.05it/s] {'loss': 0.0, 'learning_rate': 5.6399999999999995e-05, 'epoch': 0.02} 44%|████▎ | 218/500 [03:46<02:17, 2.05it/s] 44%|████▍ | 219/500 [03:46<02:16, 2.07it/s] {'loss': 0.0, 'learning_rate': 5.620000000000001e-05, 'epoch': 0.02} 44%|████▍ | 219/500 [03:46<02:16, 2.07it/s] 44%|████▍ | 220/500 [03:47<02:25, 1.92it/s] {'loss': 0.0, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.02} 44%|████▍ | 220/500 [03:47<02:25, 1.92it/s] 44%|████▍ | 221/500 [03:47<02:33, 1.82it/s] {'loss': 0.0, 'learning_rate': 5.580000000000001e-05, 'epoch': 0.02} 44%|████▍ | 221/500 [03:47<02:33, 1.82it/s] 44%|████▍ | 222/500 [03:48<02:29, 1.86it/s] {'loss': 0.0, 'learning_rate': 5.560000000000001e-05, 'epoch': 0.02} 44%|████▍ | 222/500 [03:48<02:29, 1.86it/s] 45%|████▍ | 223/500 [03:48<02:23, 1.93it/s] {'loss': 0.0, 'learning_rate': 5.5400000000000005e-05, 'epoch': 0.02} 45%|████▍ | 223/500 [03:48<02:23, 1.93it/s] 45%|████▍ | 224/500 [03:49<02:27, 1.87it/s] {'loss': 0.0, 'learning_rate': 5.520000000000001e-05, 'epoch': 0.02} 45%|████▍ | 224/500 [03:49<02:27, 1.87it/s] 45%|████▌ | 225/500 [03:50<02:28, 1.85it/s] {'loss': 0.0, 'learning_rate': 5.500000000000001e-05, 'epoch': 0.02} 45%|████▌ | 225/500 [03:50<02:28, 1.85it/s] 45%|████▌ | 226/500 [03:50<02:28, 1.84it/s] {'loss': 0.0, 'learning_rate': 5.4800000000000004e-05, 'epoch': 0.02} 45%|████▌ | 226/500 [03:50<02:28, 1.84it/s] 45%|████▌ | 227/500 [03:51<02:38, 1.72it/s] {'loss': 0.0, 'learning_rate': 5.4600000000000006e-05, 'epoch': 0.02} 45%|████▌ | 227/500 [03:51<02:38, 1.72it/s] 46%|████▌ | 228/500 [03:51<02:33, 1.77it/s] {'loss': 0.0, 'learning_rate': 5.440000000000001e-05, 'epoch': 0.02} 46%|████▌ | 228/500 [03:51<02:33, 1.77it/s] 46%|████▌ | 229/500 [03:52<02:31, 1.79it/s] {'loss': 0.0, 'learning_rate': 5.420000000000001e-05, 'epoch': 0.02} 46%|████▌ | 229/500 [03:52<02:31, 1.79it/s] 46%|████▌ | 230/500 [03:52<02:34, 1.75it/s] {'loss': 0.0, 'learning_rate': 5.4000000000000005e-05, 'epoch': 0.02} 46%|████▌ | 230/500 [03:52<02:34, 1.75it/s] 46%|████▌ | 231/500 [03:53<02:40, 1.68it/s] {'loss': 0.0, 'learning_rate': 5.380000000000001e-05, 'epoch': 0.02} 46%|████▌ | 231/500 [03:53<02:40, 1.68it/s] 46%|████▋ | 232/500 [03:54<03:22, 1.32it/s] {'loss': 0.0, 'learning_rate': 5.360000000000001e-05, 'epoch': 0.02} 46%|████▋ | 232/500 [03:54<03:22, 1.32it/s] 47%|████▋ | 233/500 [03:56<04:05, 1.09it/s] {'loss': 0.0, 'learning_rate': 5.3400000000000004e-05, 'epoch': 0.02} 47%|████▋ | 233/500 [03:56<04:05, 1.09it/s] 47%|████▋ | 234/500 [03:57<04:46, 1.08s/it] {'loss': 0.0, 'learning_rate': 5.3200000000000006e-05, 'epoch': 0.02} 47%|████▋ | 234/500 [03:57<04:46, 1.08s/it] 47%|████▋ | 235/500 [03:58<05:07, 1.16s/it] {'loss': 0.0, 'learning_rate': 5.300000000000001e-05, 'epoch': 0.02} 47%|████▋ | 235/500 [03:58<05:07, 1.16s/it] 47%|████▋ | 236/500 [04:00<05:26, 1.24s/it] {'loss': 0.0, 'learning_rate': 5.28e-05, 'epoch': 0.02} 47%|████▋ | 236/500 [04:00<05:26, 1.24s/it] 47%|████▋ | 237/500 [04:01<05:33, 1.27s/it] {'loss': 0.0, 'learning_rate': 5.2600000000000005e-05, 'epoch': 0.02} 47%|████▋ | 237/500 [04:01<05:33, 1.27s/it] 48%|████▊ | 238/500 [04:03<05:45, 1.32s/it] {'loss': 0.0, 'learning_rate': 5.2400000000000007e-05, 'epoch': 0.02} 48%|████▊ | 238/500 [04:03<05:45, 1.32s/it] 48%|████▊ | 239/500 [04:04<05:48, 1.34s/it] {'loss': 0.0, 'learning_rate': 5.22e-05, 'epoch': 0.02} 48%|████▊ | 239/500 [04:04<05:48, 1.34s/it] 48%|████▊ | 240/500 [04:05<05:53, 1.36s/it] {'loss': 0.0, 'learning_rate': 5.2000000000000004e-05, 'epoch': 0.02} 48%|████▊ | 240/500 [04:05<05:53, 1.36s/it] 48%|████▊ | 241/500 [04:07<05:56, 1.38s/it] {'loss': 0.0, 'learning_rate': 5.1800000000000005e-05, 'epoch': 0.02} 48%|████▊ | 241/500 [04:07<05:56, 1.38s/it] 48%|████▊ | 242/500 [04:08<05:57, 1.38s/it] {'loss': 0.0, 'learning_rate': 5.16e-05, 'epoch': 0.02} 48%|████▊ | 242/500 [04:08<05:57, 1.38s/it] 49%|████▊ | 243/500 [04:10<05:56, 1.39s/it] {'loss': 0.0, 'learning_rate': 5.14e-05, 'epoch': 0.02} 49%|████▊ | 243/500 [04:10<05:56, 1.39s/it] 49%|████▉ | 244/500 [04:11<06:02, 1.42s/it] {'loss': 0.0, 'learning_rate': 5.1200000000000004e-05, 'epoch': 0.02} 49%|████▉ | 244/500 [04:11<06:02, 1.42s/it] 49%|████▉ | 245/500 [04:12<06:02, 1.42s/it] {'loss': 0.0, 'learning_rate': 5.1000000000000006e-05, 'epoch': 0.02} 49%|████▉ | 245/500 [04:12<06:02, 1.42s/it] 49%|████▉ | 246/500 [04:14<06:01, 1.42s/it] {'loss': 0.0, 'learning_rate': 5.08e-05, 'epoch': 0.02} 49%|████▉ | 246/500 [04:14<06:01, 1.42s/it] 49%|████▉ | 247/500 [04:15<05:59, 1.42s/it] {'loss': 0.0, 'learning_rate': 5.0600000000000003e-05, 'epoch': 0.02} 49%|████▉ | 247/500 [04:15<05:59, 1.42s/it] 50%|████▉ | 248/500 [04:17<05:55, 1.41s/it] {'loss': 0.0, 'learning_rate': 5.0400000000000005e-05, 'epoch': 0.02} 50%|████▉ | 248/500 [04:17<05:55, 1.41s/it] 50%|████▉ | 249/500 [04:18<05:54, 1.41s/it] {'loss': 0.0, 'learning_rate': 5.02e-05, 'epoch': 0.02} 50%|████▉ | 249/500 [04:18<05:54, 1.41s/it] 50%|█████ | 250/500 [04:19<05:51, 1.41s/it] {'loss': 0.0, 'learning_rate': 5e-05, 'epoch': 0.02} 50%|█████ | 250/500 [04:20<05:51, 1.41s/it][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:31:42,260 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-250/tokenizer_config.json [INFO|tokenization_utils_base.py:2446] 2023-12-10 15:31:42,261 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-250/special_tokens_map.json 50%|█████ | 251/500 [04:21<06:00, 1.45s/it] {'loss': 0.0, 'learning_rate': 4.9800000000000004e-05, 'epoch': 0.02} 50%|█████ | 251/500 [04:21<06:00, 1.45s/it] 50%|█████ | 252/500 [04:22<05:55, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.96e-05, 'epoch': 0.02} 50%|█████ | 252/500 [04:22<05:55, 1.43s/it] 51%|█████ | 253/500 [04:24<05:51, 1.42s/it] {'loss': 0.0, 'learning_rate': 4.94e-05, 'epoch': 0.02} 51%|█████ | 253/500 [04:24<05:51, 1.42s/it] 51%|█████ | 254/500 [04:25<05:48, 1.42s/it] {'loss': 0.0, 'learning_rate': 4.92e-05, 'epoch': 0.02} 51%|█████ | 254/500 [04:25<05:48, 1.42s/it] 51%|█████ | 255/500 [04:27<05:39, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.9e-05, 'epoch': 0.02} 51%|█████ | 255/500 [04:27<05:39, 1.39s/it] 51%|█████ | 256/500 [04:28<05:38, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.88e-05, 'epoch': 0.02} 51%|█████ | 256/500 [04:28<05:38, 1.39s/it] 51%|█████▏ | 257/500 [04:29<05:35, 1.38s/it] {'loss': 0.0, 'learning_rate': 4.86e-05, 'epoch': 0.02} 51%|█████▏ | 257/500 [04:29<05:35, 1.38s/it] 52%|█████▏ | 258/500 [04:31<05:36, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.8400000000000004e-05, 'epoch': 0.02} 52%|█████▏ | 258/500 [04:31<05:36, 1.39s/it] 52%|█████▏ | 259/500 [04:32<05:33, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.82e-05, 'epoch': 0.02} 52%|█████▏ | 259/500 [04:32<05:33, 1.39s/it] 52%|█████▏ | 260/500 [04:33<05:30, 1.38s/it] {'loss': 0.0, 'learning_rate': 4.8e-05, 'epoch': 0.02} 52%|█████▏ | 260/500 [04:33<05:30, 1.38s/it] 52%|█████▏ | 261/500 [04:35<05:31, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.78e-05, 'epoch': 0.02} 52%|█████▏ | 261/500 [04:35<05:31, 1.39s/it] 52%|█████▏ | 262/500 [04:36<05:31, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.76e-05, 'epoch': 0.02} 52%|█████▏ | 262/500 [04:36<05:31, 1.39s/it] 53%|█████▎ | 263/500 [04:38<05:31, 1.40s/it] {'loss': 0.0, 'learning_rate': 4.74e-05, 'epoch': 0.02} 53%|█████▎ | 263/500 [04:38<05:31, 1.40s/it] 53%|█████▎ | 264/500 [04:39<05:28, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.72e-05, 'epoch': 0.02} 53%|█████▎ | 264/500 [04:39<05:28, 1.39s/it] 53%|█████▎ | 265/500 [04:40<05:27, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.7e-05, 'epoch': 0.02} 53%|█████▎ | 265/500 [04:40<05:27, 1.39s/it] 53%|█████▎ | 266/500 [04:42<05:25, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.6800000000000006e-05, 'epoch': 0.02} 53%|█████▎ | 266/500 [04:42<05:25, 1.39s/it] 53%|█████▎ | 267/500 [04:43<05:22, 1.38s/it] {'loss': 0.0, 'learning_rate': 4.660000000000001e-05, 'epoch': 0.02} 53%|█████▎ | 267/500 [04:43<05:22, 1.38s/it] 54%|█████▎ | 268/500 [04:45<05:23, 1.40s/it] {'loss': 0.0, 'learning_rate': 4.64e-05, 'epoch': 0.02} 54%|█████▎ | 268/500 [04:45<05:23, 1.40s/it] 54%|█████▍ | 269/500 [04:46<05:22, 1.40s/it] {'loss': 0.0, 'learning_rate': 4.6200000000000005e-05, 'epoch': 0.02} 54%|█████▍ | 269/500 [04:46<05:22, 1.40s/it] 54%|█████▍ | 270/500 [04:47<05:25, 1.42s/it] {'loss': 0.0, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.02} 54%|█████▍ | 270/500 [04:47<05:25, 1.42s/it] 54%|█████▍ | 271/500 [04:49<05:23, 1.41s/it] {'loss': 0.0, 'learning_rate': 4.58e-05, 'epoch': 0.02} 54%|█████▍ | 271/500 [04:49<05:23, 1.41s/it] 54%|█████▍ | 272/500 [04:50<05:20, 1.41s/it] {'loss': 0.0, 'learning_rate': 4.5600000000000004e-05, 'epoch': 0.02} 54%|█████▍ | 272/500 [04:50<05:20, 1.41s/it] 55%|█████▍ | 273/500 [04:52<05:22, 1.42s/it] {'loss': 0.0, 'learning_rate': 4.5400000000000006e-05, 'epoch': 0.02} 55%|█████▍ | 273/500 [04:52<05:22, 1.42s/it] 55%|█████▍ | 274/500 [04:53<05:14, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.52e-05, 'epoch': 0.02} 55%|█████▍ | 274/500 [04:53<05:14, 1.39s/it] 55%|█████▌ | 275/500 [04:55<05:17, 1.41s/it] {'loss': 0.0, 'learning_rate': 4.5e-05, 'epoch': 0.02} 55%|█████▌ | 275/500 [04:55<05:17, 1.41s/it] 55%|█████▌ | 276/500 [04:56<05:15, 1.41s/it] {'loss': 0.0, 'learning_rate': 4.4800000000000005e-05, 'epoch': 0.02} 55%|█████▌ | 276/500 [04:56<05:15, 1.41s/it] 55%|█████▌ | 277/500 [04:57<05:15, 1.42s/it] {'loss': 0.0, 'learning_rate': 4.46e-05, 'epoch': 0.02} 55%|█████▌ | 277/500 [04:57<05:15, 1.42s/it] 56%|█████▌ | 278/500 [04:59<05:17, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.44e-05, 'epoch': 0.02} 56%|█████▌ | 278/500 [04:59<05:17, 1.43s/it] 56%|█████▌ | 279/500 [05:00<05:15, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.4200000000000004e-05, 'epoch': 0.02} 56%|█████▌ | 279/500 [05:00<05:15, 1.43s/it] 56%|█████▌ | 280/500 [05:02<05:16, 1.44s/it] {'loss': 0.0, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.02} 56%|█████▌ | 280/500 [05:02<05:16, 1.44s/it] 56%|█████▌ | 281/500 [05:03<05:17, 1.45s/it] {'loss': 0.0, 'learning_rate': 4.38e-05, 'epoch': 0.02} 56%|█████▌ | 281/500 [05:03<05:17, 1.45s/it] 56%|█████▋ | 282/500 [05:05<05:17, 1.45s/it] {'loss': 0.0, 'learning_rate': 4.36e-05, 'epoch': 0.02} 56%|█████▋ | 282/500 [05:05<05:17, 1.45s/it] 57%|█████▋ | 283/500 [05:06<05:10, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.3400000000000005e-05, 'epoch': 0.02} 57%|█████▋ | 283/500 [05:06<05:10, 1.43s/it] 57%|█████▋ | 284/500 [05:07<05:08, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.32e-05, 'epoch': 0.02} 57%|█████▋ | 284/500 [05:07<05:08, 1.43s/it] 57%|█████▋ | 285/500 [05:09<05:08, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.3e-05, 'epoch': 0.02} 57%|█████▋ | 285/500 [05:09<05:08, 1.43s/it] 57%|█████▋ | 286/500 [05:10<05:07, 1.44s/it] {'loss': 0.0, 'learning_rate': 4.2800000000000004e-05, 'epoch': 0.02} 57%|█████▋ | 286/500 [05:10<05:07, 1.44s/it] 57%|█████▋ | 287/500 [05:12<05:06, 1.44s/it] {'loss': 0.0, 'learning_rate': 4.26e-05, 'epoch': 0.02} 57%|█████▋ | 287/500 [05:12<05:06, 1.44s/it] 58%|█████▊ | 288/500 [05:13<05:05, 1.44s/it] {'loss': 0.0, 'learning_rate': 4.24e-05, 'epoch': 0.02} 58%|█████▊ | 288/500 [05:13<05:05, 1.44s/it] 58%|█████▊ | 289/500 [05:15<05:02, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.22e-05, 'epoch': 0.02} 58%|█████▊ | 289/500 [05:15<05:02, 1.43s/it] 58%|█████▊ | 290/500 [05:16<04:38, 1.33s/it] {'loss': 0.0, 'learning_rate': 4.2e-05, 'epoch': 0.02} 58%|█████▊ | 290/500 [05:16<04:38, 1.33s/it] 58%|█████▊ | 291/500 [05:17<04:11, 1.20s/it] {'loss': 0.0, 'learning_rate': 4.18e-05, 'epoch': 0.02} 58%|█████▊ | 291/500 [05:17<04:11, 1.20s/it] 58%|█████▊ | 292/500 [05:18<03:54, 1.13s/it] {'loss': 0.0, 'learning_rate': 4.16e-05, 'epoch': 0.02} 58%|█████▊ | 292/500 [05:18<03:54, 1.13s/it] 59%|█████▊ | 293/500 [05:19<03:42, 1.07s/it] {'loss': 0.0, 'learning_rate': 4.14e-05, 'epoch': 0.02} 59%|█████▊ | 293/500 [05:19<03:42, 1.07s/it] 59%|█████▉ | 294/500 [05:19<03:31, 1.03s/it] {'loss': 0.0, 'learning_rate': 4.12e-05, 'epoch': 0.02} 59%|█████▉ | 294/500 [05:19<03:31, 1.03s/it] 59%|█████▉ | 295/500 [05:20<03:23, 1.01it/s] {'loss': 0.0, 'learning_rate': 4.1e-05, 'epoch': 0.02} 59%|█████▉ | 295/500 [05:20<03:23, 1.01it/s] 59%|█████▉ | 296/500 [05:21<03:16, 1.04it/s] {'loss': 0.0, 'learning_rate': 4.08e-05, 'epoch': 0.02} 59%|█████▉ | 296/500 [05:21<03:16, 1.04it/s] 59%|█████▉ | 297/500 [05:22<03:11, 1.06it/s] {'loss': 0.0, 'learning_rate': 4.0600000000000004e-05, 'epoch': 0.02} 59%|█████▉ | 297/500 [05:22<03:11, 1.06it/s] 60%|█████▉ | 298/500 [05:23<03:22, 1.00s/it] {'loss': 0.0, 'learning_rate': 4.0400000000000006e-05, 'epoch': 0.02} 60%|█████▉ | 298/500 [05:23<03:22, 1.00s/it] 60%|█████▉ | 299/500 [05:25<03:44, 1.11s/it] {'loss': 0.0, 'learning_rate': 4.02e-05, 'epoch': 0.02} 60%|█████▉ | 299/500 [05:25<03:44, 1.11s/it] 60%|██████ | 300/500 [05:26<04:02, 1.21s/it] {'loss': 0.0, 'learning_rate': 4e-05, 'epoch': 0.02} 60%|██████ | 300/500 [05:26<04:02, 1.21s/it][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:32:48,867 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-300/tokenizer_config.json [INFO|tokenization_utils_base.py:2446] 2023-12-10 15:32:48,867 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-300/special_tokens_map.json 60%|██████ | 301/500 [05:28<04:13, 1.28s/it] {'loss': 0.0, 'learning_rate': 3.9800000000000005e-05, 'epoch': 0.02} 60%|██████ | 301/500 [05:28<04:13, 1.28s/it] 60%|██████ | 302/500 [05:29<04:03, 1.23s/it] {'loss': 0.0, 'learning_rate': 3.960000000000001e-05, 'epoch': 0.02} 60%|██████ | 302/500 [05:29<04:03, 1.23s/it] 61%|██████ | 303/500 [05:30<03:43, 1.14s/it] {'loss': 0.0, 'learning_rate': 3.94e-05, 'epoch': 0.02} 61%|██████ | 303/500 [05:30<03:43, 1.14s/it] 61%|██████ | 304/500 [05:31<03:30, 1.08s/it] {'loss': 0.0, 'learning_rate': 3.9200000000000004e-05, 'epoch': 0.02} 61%|██████ | 304/500 [05:31<03:30, 1.08s/it] 61%|██████ | 305/500 [05:31<03:21, 1.03s/it] {'loss': 0.0, 'learning_rate': 3.9000000000000006e-05, 'epoch': 0.02} 61%|██████ | 305/500 [05:31<03:21, 1.03s/it] 61%|██████ | 306/500 [05:32<03:14, 1.00s/it] {'loss': 0.0, 'learning_rate': 3.88e-05, 'epoch': 0.02} 61%|██████ | 306/500 [05:32<03:14, 1.00s/it] 61%|██████▏ | 307/500 [05:33<03:07, 1.03it/s] {'loss': 0.0, 'learning_rate': 3.86e-05, 'epoch': 0.02} 61%|██████▏ | 307/500 [05:33<03:07, 1.03it/s] 62%|██████▏ | 308/500 [05:34<02:36, 1.22it/s] {'loss': 0.0, 'learning_rate': 3.8400000000000005e-05, 'epoch': 0.02} 62%|██████▏ | 308/500 [05:34<02:36, 1.22it/s] 62%|██████▏ | 309/500 [05:34<02:15, 1.41it/s] {'loss': 0.0, 'learning_rate': 3.82e-05, 'epoch': 0.02} 62%|██████▏ | 309/500 [05:34<02:15, 1.41it/s] 62%|██████▏ | 310/500 [05:35<01:59, 1.58it/s] {'loss': 0.0, 'learning_rate': 3.8e-05, 'epoch': 0.02} 62%|██████▏ | 310/500 [05:35<01:59, 1.58it/s] 62%|██████▏ | 311/500 [05:35<01:49, 1.73it/s] {'loss': 0.0, 'learning_rate': 3.7800000000000004e-05, 'epoch': 0.02} 62%|██████▏ | 311/500 [05:35<01:49, 1.73it/s] 62%|██████▏ | 312/500 [05:36<01:41, 1.85it/s] {'loss': 0.0, 'learning_rate': 3.76e-05, 'epoch': 0.02} 62%|██████▏ | 312/500 [05:36<01:41, 1.85it/s] 63%|██████▎ | 313/500 [05:36<01:36, 1.95it/s] {'loss': 0.0, 'learning_rate': 3.74e-05, 'epoch': 0.02} 63%|██████▎ | 313/500 [05:36<01:36, 1.95it/s] 63%|██████▎ | 314/500 [05:36<01:35, 1.95it/s] {'loss': 0.0, 'learning_rate': 3.72e-05, 'epoch': 0.02} 63%|██████▎ | 314/500 [05:37<01:35, 1.95it/s] 63%|██████▎ | 315/500 [05:37<01:40, 1.84it/s] {'loss': 0.0, 'learning_rate': 3.7e-05, 'epoch': 0.02} 63%|██████▎ | 315/500 [05:37<01:40, 1.84it/s] 63%|██████▎ | 316/500 [05:38<01:46, 1.74it/s] {'loss': 0.0, 'learning_rate': 3.68e-05, 'epoch': 0.02} 63%|██████▎ | 316/500 [05:38<01:46, 1.74it/s] 63%|██████▎ | 317/500 [05:38<01:51, 1.64it/s] {'loss': 0.0, 'learning_rate': 3.66e-05, 'epoch': 0.02} 63%|██████▎ | 317/500 [05:38<01:51, 1.64it/s] 64%|██████▎ | 318/500 [05:39<01:52, 1.62it/s] {'loss': 0.0, 'learning_rate': 3.6400000000000004e-05, 'epoch': 0.02} 64%|██████▎ | 318/500 [05:39<01:52, 1.62it/s] 64%|██████▍ | 319/500 [05:40<01:44, 1.72it/s] {'loss': 0.0, 'learning_rate': 3.62e-05, 'epoch': 0.02} 64%|██████▍ | 319/500 [05:40<01:44, 1.72it/s] 64%|██████▍ | 320/500 [05:40<01:46, 1.69it/s] {'loss': 0.0, 'learning_rate': 3.6e-05, 'epoch': 0.02} 64%|██████▍ | 320/500 [05:40<01:46, 1.69it/s] 64%|██████▍ | 321/500 [05:41<01:45, 1.70it/s] {'loss': 0.0, 'learning_rate': 3.58e-05, 'epoch': 0.02} 64%|██████▍ | 321/500 [05:41<01:45, 1.70it/s] 64%|██████▍ | 322/500 [05:41<01:44, 1.70it/s] {'loss': 0.0, 'learning_rate': 3.56e-05, 'epoch': 0.02} 64%|██████▍ | 322/500 [05:41<01:44, 1.70it/s] 65%|██████▍ | 323/500 [05:42<01:44, 1.69it/s] {'loss': 0.0, 'learning_rate': 3.54e-05, 'epoch': 0.02} 65%|██████▍ | 323/500 [05:42<01:44, 1.69it/s] 65%|██████▍ | 324/500 [05:43<01:47, 1.64it/s] {'loss': 0.0, 'learning_rate': 3.52e-05, 'epoch': 0.02} 65%|██████▍ | 324/500 [05:43<01:47, 1.64it/s] 65%|██████▌ | 325/500 [05:43<01:48, 1.62it/s] {'loss': 0.0, 'learning_rate': 3.5e-05, 'epoch': 0.02} 65%|██████▌ | 325/500 [05:43<01:48, 1.62it/s] 65%|██████▌ | 326/500 [05:44<01:51, 1.57it/s] {'loss': 0.0, 'learning_rate': 3.48e-05, 'epoch': 0.03} 65%|██████▌ | 326/500 [05:44<01:51, 1.57it/s] 65%|██████▌ | 327/500 [05:45<02:14, 1.29it/s] {'loss': 0.0, 'learning_rate': 3.46e-05, 'epoch': 0.03} 65%|██████▌ | 327/500 [05:45<02:14, 1.29it/s] 66%|██████▌ | 328/500 [05:46<02:44, 1.05it/s] {'loss': 0.0, 'learning_rate': 3.4399999999999996e-05, 'epoch': 0.03} 66%|██████▌ | 328/500 [05:46<02:44, 1.05it/s] 66%|██████▌ | 329/500 [05:48<03:05, 1.09s/it] {'loss': 0.0, 'learning_rate': 3.4200000000000005e-05, 'epoch': 0.03} 66%|██████▌ | 329/500 [05:48<03:05, 1.09s/it] 66%|██████▌ | 330/500 [05:49<03:11, 1.13s/it] {'loss': 0.0, 'learning_rate': 3.4000000000000007e-05, 'epoch': 0.03} 66%|██████▌ | 330/500 [05:49<03:11, 1.13s/it] 66%|██████▌ | 331/500 [05:50<03:03, 1.09s/it] {'loss': 0.0, 'learning_rate': 3.38e-05, 'epoch': 0.03} 66%|██████▌ | 331/500 [05:50<03:03, 1.09s/it] 66%|██████▋ | 332/500 [05:51<02:51, 1.02s/it] {'loss': 0.0, 'learning_rate': 3.3600000000000004e-05, 'epoch': 0.03} 66%|██████▋ | 332/500 [05:51<02:51, 1.02s/it] 67%|██████▋ | 333/500 [05:52<02:45, 1.01it/s] {'loss': 0.0, 'learning_rate': 3.3400000000000005e-05, 'epoch': 0.03} 67%|██████▋ | 333/500 [05:52<02:45, 1.01it/s] 67%|██████▋ | 334/500 [05:53<02:38, 1.05it/s] {'loss': 0.0, 'learning_rate': 3.32e-05, 'epoch': 0.03} 67%|██████▋ | 334/500 [05:53<02:38, 1.05it/s] 67%|██████▋ | 335/500 [05:54<02:34, 1.07it/s] {'loss': 0.0, 'learning_rate': 3.3e-05, 'epoch': 0.03} 67%|██████▋ | 335/500 [05:54<02:34, 1.07it/s] 67%|██████▋ | 336/500 [05:54<02:23, 1.14it/s] {'loss': 0.0, 'learning_rate': 3.2800000000000004e-05, 'epoch': 0.03} 67%|██████▋ | 336/500 [05:54<02:23, 1.14it/s] 67%|██████▋ | 337/500 [05:55<02:16, 1.19it/s] {'loss': 0.0, 'learning_rate': 3.26e-05, 'epoch': 0.03} 67%|██████▋ | 337/500 [05:55<02:16, 1.19it/s] 68%|██████▊ | 338/500 [05:56<02:12, 1.22it/s] {'loss': 0.0, 'learning_rate': 3.24e-05, 'epoch': 0.03} 68%|██████▊ | 338/500 [05:56<02:12, 1.22it/s] 68%|██████▊ | 339/500 [05:56<02:00, 1.33it/s] {'loss': 0.0, 'learning_rate': 3.2200000000000003e-05, 'epoch': 0.03} 68%|██████▊ | 339/500 [05:56<02:00, 1.33it/s] 68%|██████▊ | 340/500 [05:57<01:50, 1.45it/s] {'loss': 0.0, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.03} 68%|██████▊ | 340/500 [05:57<01:50, 1.45it/s] 68%|██████▊ | 341/500 [05:58<01:43, 1.53it/s] {'loss': 0.0, 'learning_rate': 3.18e-05, 'epoch': 0.03} 68%|██████▊ | 341/500 [05:58<01:43, 1.53it/s] 68%|██████▊ | 342/500 [05:58<01:42, 1.54it/s] {'loss': 0.0, 'learning_rate': 3.16e-05, 'epoch': 0.03} 68%|██████▊ | 342/500 [05:58<01:42, 1.54it/s] 69%|██████▊ | 343/500 [05:59<01:39, 1.57it/s] {'loss': 0.0, 'learning_rate': 3.1400000000000004e-05, 'epoch': 0.03} 69%|██████▊ | 343/500 [05:59<01:39, 1.57it/s] 69%|██████▉ | 344/500 [05:59<01:35, 1.63it/s] {'loss': 0.0, 'learning_rate': 3.12e-05, 'epoch': 0.03} 69%|██████▉ | 344/500 [05:59<01:35, 1.63it/s] 69%|██████▉ | 345/500 [06:00<01:34, 1.63it/s] {'loss': 0.0, 'learning_rate': 3.1e-05, 'epoch': 0.03} 69%|██████▉ | 345/500 [06:00<01:34, 1.63it/s] 69%|██████▉ | 346/500 [06:01<01:45, 1.46it/s] {'loss': 0.0, 'learning_rate': 3.08e-05, 'epoch': 0.03} 69%|██████▉ | 346/500 [06:01<01:45, 1.46it/s] 69%|██████▉ | 347/500 [06:02<02:05, 1.22it/s] {'loss': 0.0, 'learning_rate': 3.06e-05, 'epoch': 0.03} 69%|██████▉ | 347/500 [06:02<02:05, 1.22it/s] 70%|██████▉ | 348/500 [06:03<02:07, 1.19it/s] {'loss': 0.0, 'learning_rate': 3.04e-05, 'epoch': 0.03} 70%|██████▉ | 348/500 [06:03<02:07, 1.19it/s] 70%|██████▉ | 349/500 [06:04<02:08, 1.18it/s] {'loss': 0.0, 'learning_rate': 3.02e-05, 'epoch': 0.03} 70%|██████▉ | 349/500 [06:04<02:08, 1.18it/s] 70%|███████ | 350/500 [06:05<02:25, 1.03it/s] {'loss': 0.0, 'learning_rate': 3e-05, 'epoch': 0.03} 70%|███████ | 350/500 [06:05<02:25, 1.03it/s][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:33:27,722 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-350/tokenizer_config.json [INFO|tokenization_utils_base.py:2446] 2023-12-10 15:33:27,723 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-350/special_tokens_map.json 70%|███████ | 351/500 [06:06<02:45, 1.11s/it] {'loss': 0.0, 'learning_rate': 2.98e-05, 'epoch': 0.03} 70%|███████ | 351/500 [06:06<02:45, 1.11s/it] 70%|███████ | 352/500 [06:08<02:56, 1.19s/it] {'loss': 0.0, 'learning_rate': 2.96e-05, 'epoch': 0.03} 70%|███████ | 352/500 [06:08<02:56, 1.19s/it] 71%|███████ | 353/500 [06:09<03:04, 1.25s/it] {'loss': 0.0, 'learning_rate': 2.94e-05, 'epoch': 0.03} 71%|███████ | 353/500 [06:09<03:04, 1.25s/it] 71%|███████ | 354/500 [06:11<03:08, 1.29s/it] {'loss': 0.0, 'learning_rate': 2.9199999999999998e-05, 'epoch': 0.03} 71%|███████ | 354/500 [06:11<03:08, 1.29s/it] 71%|███████ | 355/500 [06:12<03:11, 1.32s/it] {'loss': 0.0, 'learning_rate': 2.9e-05, 'epoch': 0.03} 71%|███████ | 355/500 [06:12<03:11, 1.32s/it] 71%|███████ | 356/500 [06:13<03:11, 1.33s/it] {'loss': 0.0, 'learning_rate': 2.88e-05, 'epoch': 0.03} 71%|███████ | 356/500 [06:13<03:11, 1.33s/it] 71%|███████▏ | 357/500 [06:15<03:13, 1.35s/it] {'loss': 0.0, 'learning_rate': 2.86e-05, 'epoch': 0.03} 71%|███████▏ | 357/500 [06:15<03:13, 1.35s/it] 72%|███████▏ | 358/500 [06:16<03:15, 1.38s/it] {'loss': 0.0, 'learning_rate': 2.84e-05, 'epoch': 0.03} 72%|███████▏ | 358/500 [06:16<03:15, 1.38s/it] 72%|███████▏ | 359/500 [06:18<03:14, 1.38s/it] {'loss': 0.0, 'learning_rate': 2.8199999999999998e-05, 'epoch': 0.03} 72%|███████▏ | 359/500 [06:18<03:14, 1.38s/it] 72%|███████▏ | 360/500 [06:19<03:14, 1.39s/it] {'loss': 0.0, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.03} 72%|███████▏ | 360/500 [06:19<03:14, 1.39s/it] 72%|███████▏ | 361/500 [06:20<03:14, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.7800000000000005e-05, 'epoch': 0.03} 72%|███████▏ | 361/500 [06:20<03:14, 1.40s/it] 72%|███████▏ | 362/500 [06:22<03:12, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.7600000000000003e-05, 'epoch': 0.03} 72%|███████▏ | 362/500 [06:22<03:12, 1.40s/it] 73%|███████▎ | 363/500 [06:23<03:12, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.7400000000000002e-05, 'epoch': 0.03} 73%|███████▎ | 363/500 [06:23<03:12, 1.40s/it] 73%|███████▎ | 364/500 [06:25<03:11, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.7200000000000004e-05, 'epoch': 0.03} 73%|███████▎ | 364/500 [06:25<03:11, 1.40s/it] 73%|███████▎ | 365/500 [06:26<03:09, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.7000000000000002e-05, 'epoch': 0.03} 73%|███████▎ | 365/500 [06:26<03:09, 1.40s/it] 73%|███████▎ | 366/500 [06:27<03:07, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.6800000000000004e-05, 'epoch': 0.03} 73%|███████▎ | 366/500 [06:27<03:07, 1.40s/it] 73%|███████▎ | 367/500 [06:29<03:06, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.6600000000000003e-05, 'epoch': 0.03} 73%|███████▎ | 367/500 [06:29<03:06, 1.40s/it] 74%|███████▎ | 368/500 [06:30<03:06, 1.41s/it] {'loss': 0.0, 'learning_rate': 2.64e-05, 'epoch': 0.03} 74%|███████▎ | 368/500 [06:30<03:06, 1.41s/it] 74%|███████▍ | 369/500 [06:32<03:04, 1.41s/it] {'loss': 0.0, 'learning_rate': 2.6200000000000003e-05, 'epoch': 0.03} 74%|███████▍ | 369/500 [06:32<03:04, 1.41s/it] 74%|███████▍ | 370/500 [06:33<03:03, 1.41s/it] {'loss': 0.0, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.03} 74%|███████▍ | 370/500 [06:33<03:03, 1.41s/it] 74%|███████▍ | 371/500 [06:34<03:01, 1.41s/it] {'loss': 0.0, 'learning_rate': 2.58e-05, 'epoch': 0.03} 74%|███████▍ | 371/500 [06:34<03:01, 1.41s/it] 74%|███████▍ | 372/500 [06:36<02:59, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.5600000000000002e-05, 'epoch': 0.03} 74%|███████▍ | 372/500 [06:36<02:59, 1.40s/it] 75%|███████▍ | 373/500 [06:37<02:56, 1.39s/it] {'loss': 0.0, 'learning_rate': 2.54e-05, 'epoch': 0.03} 75%|███████▍ | 373/500 [06:37<02:56, 1.39s/it] 75%|███████▍ | 374/500 [06:39<02:55, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.5200000000000003e-05, 'epoch': 0.03} 75%|███████▍ | 374/500 [06:39<02:55, 1.40s/it] 75%|███████▌ | 375/500 [06:40<02:53, 1.39s/it] {'loss': 0.0, 'learning_rate': 2.5e-05, 'epoch': 0.03} 75%|███████▌ | 375/500 [06:40<02:53, 1.39s/it] 75%|███████▌ | 376/500 [06:41<02:52, 1.39s/it] {'loss': 0.0, 'learning_rate': 2.48e-05, 'epoch': 0.03} 75%|███████▌ | 376/500 [06:41<02:52, 1.39s/it] 75%|███████▌ | 377/500 [06:43<02:52, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.46e-05, 'epoch': 0.03} 75%|███████▌ | 377/500 [06:43<02:52, 1.40s/it] 76%|███████▌ | 378/500 [06:44<02:50, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.44e-05, 'epoch': 0.03} 76%|███████▌ | 378/500 [06:44<02:50, 1.40s/it] 76%|███████▌ | 379/500 [06:46<02:51, 1.42s/it] {'loss': 0.0, 'learning_rate': 2.4200000000000002e-05, 'epoch': 0.03} 76%|███████▌ | 379/500 [06:46<02:51, 1.42s/it] 76%|███████▌ | 380/500 [06:47<02:51, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.4e-05, 'epoch': 0.03} 76%|███████▌ | 380/500 [06:47<02:51, 1.43s/it] 76%|███████▌ | 381/500 [06:48<02:49, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.38e-05, 'epoch': 0.03} 76%|███████▌ | 381/500 [06:48<02:49, 1.43s/it] 76%|███████▋ | 382/500 [06:50<02:46, 1.41s/it] {'loss': 0.0, 'learning_rate': 2.36e-05, 'epoch': 0.03} 76%|███████▋ | 382/500 [06:50<02:46, 1.41s/it] 77%|███████▋ | 383/500 [06:51<02:45, 1.42s/it] {'loss': 0.0, 'learning_rate': 2.3400000000000003e-05, 'epoch': 0.03} 77%|███████▋ | 383/500 [06:51<02:45, 1.42s/it] 77%|███████▋ | 384/500 [06:53<02:45, 1.42s/it] {'loss': 0.0, 'learning_rate': 2.32e-05, 'epoch': 0.03} 77%|███████▋ | 384/500 [06:53<02:45, 1.42s/it] 77%|███████▋ | 385/500 [06:54<02:44, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.3000000000000003e-05, 'epoch': 0.03} 77%|███████▋ | 385/500 [06:54<02:44, 1.43s/it] 77%|███████▋ | 386/500 [06:56<02:44, 1.44s/it] {'loss': 0.0, 'learning_rate': 2.2800000000000002e-05, 'epoch': 0.03} 77%|███████▋ | 386/500 [06:56<02:44, 1.44s/it] 77%|███████▋ | 387/500 [06:57<02:42, 1.44s/it] {'loss': 0.0, 'learning_rate': 2.26e-05, 'epoch': 0.03} 77%|███████▋ | 387/500 [06:57<02:42, 1.44s/it] 78%|███████▊ | 388/500 [06:59<02:41, 1.45s/it] {'loss': 0.0, 'learning_rate': 2.2400000000000002e-05, 'epoch': 0.03} 78%|███████▊ | 388/500 [06:59<02:41, 1.45s/it] 78%|███████▊ | 389/500 [07:00<02:38, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.22e-05, 'epoch': 0.03} 78%|███████▊ | 389/500 [07:00<02:38, 1.43s/it] 78%|███████▊ | 390/500 [07:01<02:22, 1.30s/it] {'loss': 0.0, 'learning_rate': 2.2000000000000003e-05, 'epoch': 0.03} 78%|███████▊ | 390/500 [07:01<02:22, 1.30s/it] 78%|███████▊ | 391/500 [07:02<02:08, 1.18s/it] {'loss': 0.0, 'learning_rate': 2.18e-05, 'epoch': 0.03} 78%|███████▊ | 391/500 [07:02<02:08, 1.18s/it] 78%|███████▊ | 392/500 [07:03<01:59, 1.10s/it] {'loss': 0.0, 'learning_rate': 2.16e-05, 'epoch': 0.03} 78%|███████▊ | 392/500 [07:03<01:59, 1.10s/it] 79%|███████▊ | 393/500 [07:04<01:51, 1.04s/it] {'loss': 0.0, 'learning_rate': 2.1400000000000002e-05, 'epoch': 0.03} 79%|███████▊ | 393/500 [07:04<01:51, 1.04s/it] 79%|███████▉ | 394/500 [07:05<01:46, 1.01s/it] {'loss': 0.0, 'learning_rate': 2.12e-05, 'epoch': 0.03} 79%|███████▉ | 394/500 [07:05<01:46, 1.01s/it] 79%|███████▉ | 395/500 [07:05<01:42, 1.03it/s] {'loss': 0.0, 'learning_rate': 2.1e-05, 'epoch': 0.03} 79%|███████▉ | 395/500 [07:05<01:42, 1.03it/s] 79%|███████▉ | 396/500 [07:06<01:35, 1.09it/s] {'loss': 0.0, 'learning_rate': 2.08e-05, 'epoch': 0.03} 79%|███████▉ | 396/500 [07:06<01:35, 1.09it/s] 79%|███████▉ | 397/500 [07:07<01:24, 1.23it/s] {'loss': 0.0, 'learning_rate': 2.06e-05, 'epoch': 0.03} 79%|███████▉ | 397/500 [07:07<01:24, 1.23it/s] 80%|███████▉ | 398/500 [07:08<01:20, 1.27it/s] {'loss': 0.0, 'learning_rate': 2.04e-05, 'epoch': 0.03} 80%|███████▉ | 398/500 [07:08<01:20, 1.27it/s] 80%|███████▉ | 399/500 [07:08<01:16, 1.33it/s] {'loss': 0.0, 'learning_rate': 2.0200000000000003e-05, 'epoch': 0.03} 80%|███████▉ | 399/500 [07:08<01:16, 1.33it/s] 80%|████████ | 400/500 [07:09<01:17, 1.29it/s] {'loss': 0.0, 'learning_rate': 2e-05, 'epoch': 0.03} 80%|████████ | 400/500 [07:09<01:17, 1.29it/s][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:34:31,811 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-400/tokenizer_config.json [INFO|tokenization_utils_base.py:2446] 2023-12-10 15:34:31,811 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-400/special_tokens_map.json 80%|████████ | 401/500 [07:10<01:18, 1.26it/s] {'loss': 0.0, 'learning_rate': 1.9800000000000004e-05, 'epoch': 0.03} 80%|████████ | 401/500 [07:10<01:18, 1.26it/s] 80%|████████ | 402/500 [07:11<01:34, 1.03it/s] {'loss': 0.0, 'learning_rate': 1.9600000000000002e-05, 'epoch': 0.03} 80%|████████ | 402/500 [07:11<01:34, 1.03it/s] 81%|████████ | 403/500 [07:13<01:43, 1.07s/it] {'loss': 0.0, 'learning_rate': 1.94e-05, 'epoch': 0.03} 81%|████████ | 403/500 [07:13<01:43, 1.07s/it] 81%|████████ | 404/500 [07:14<01:52, 1.18s/it] {'loss': 0.0, 'learning_rate': 1.9200000000000003e-05, 'epoch': 0.03} 81%|████████ | 404/500 [07:14<01:52, 1.18s/it] 81%|████████ | 405/500 [07:15<01:52, 1.19s/it] {'loss': 0.0, 'learning_rate': 1.9e-05, 'epoch': 0.03} 81%|████████ | 405/500 [07:15<01:52, 1.19s/it] 81%|████████ | 406/500 [07:16<01:44, 1.11s/it] {'loss': 0.0, 'learning_rate': 1.88e-05, 'epoch': 0.03} 81%|████████ | 406/500 [07:16<01:44, 1.11s/it] 81%|████████▏ | 407/500 [07:17<01:38, 1.06s/it] {'loss': 0.0, 'learning_rate': 1.86e-05, 'epoch': 0.03} 81%|████████▏ | 407/500 [07:17<01:38, 1.06s/it] 82%|████████▏ | 408/500 [07:18<01:32, 1.01s/it] {'loss': 0.0, 'learning_rate': 1.84e-05, 'epoch': 0.03} 82%|████████▏ | 408/500 [07:18<01:32, 1.01s/it] 82%|████████▏ | 409/500 [07:19<01:29, 1.02it/s] {'loss': 0.0, 'learning_rate': 1.8200000000000002e-05, 'epoch': 0.03} 82%|████████▏ | 409/500 [07:19<01:29, 1.02it/s] 82%|████████▏ | 410/500 [07:20<01:27, 1.03it/s] {'loss': 0.0, 'learning_rate': 1.8e-05, 'epoch': 0.03} 82%|████████▏ | 410/500 [07:20<01:27, 1.03it/s] 82%|████████▏ | 411/500 [07:21<01:24, 1.06it/s] {'loss': 0.0, 'learning_rate': 1.78e-05, 'epoch': 0.03} 82%|████████▏ | 411/500 [07:21<01:24, 1.06it/s] 82%|████████▏ | 412/500 [07:22<01:23, 1.06it/s] {'loss': 0.0, 'learning_rate': 1.76e-05, 'epoch': 0.03} 82%|████████▏ | 412/500 [07:22<01:23, 1.06it/s] 83%|████████▎ | 413/500 [07:23<01:20, 1.08it/s] {'loss': 0.0, 'learning_rate': 1.74e-05, 'epoch': 0.03} 83%|████████▎ | 413/500 [07:23<01:20, 1.08it/s] 83%|████████▎ | 414/500 [07:23<01:17, 1.11it/s] {'loss': 0.0, 'learning_rate': 1.7199999999999998e-05, 'epoch': 0.03} 83%|████████▎ | 414/500 [07:23<01:17, 1.11it/s] 83%|████████▎ | 415/500 [07:24<01:16, 1.12it/s] {'loss': 0.0, 'learning_rate': 1.7000000000000003e-05, 'epoch': 0.03} 83%|████████▎ | 415/500 [07:24<01:16, 1.12it/s] 83%|████████▎ | 416/500 [07:25<01:14, 1.12it/s] {'loss': 0.0, 'learning_rate': 1.6800000000000002e-05, 'epoch': 0.03} 83%|████████▎ | 416/500 [07:25<01:14, 1.12it/s] 83%|████████▎ | 417/500 [07:26<01:13, 1.13it/s] {'loss': 0.0, 'learning_rate': 1.66e-05, 'epoch': 0.03} 83%|████████▎ | 417/500 [07:26<01:13, 1.13it/s] 84%|████████▎ | 418/500 [07:27<01:15, 1.09it/s] {'loss': 0.0, 'learning_rate': 1.6400000000000002e-05, 'epoch': 0.03} 84%|████████▎ | 418/500 [07:27<01:15, 1.09it/s] 84%|████████▍ | 419/500 [07:28<01:19, 1.02it/s] {'loss': 0.0, 'learning_rate': 1.62e-05, 'epoch': 0.03} 84%|████████▍ | 419/500 [07:28<01:19, 1.02it/s] 84%|████████▍ | 420/500 [07:30<01:29, 1.11s/it] {'loss': 0.0, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.03} 84%|████████▍ | 420/500 [07:30<01:29, 1.11s/it] 84%|████████▍ | 421/500 [07:31<01:36, 1.23s/it] {'loss': 0.0, 'learning_rate': 1.58e-05, 'epoch': 0.03} 84%|████████▍ | 421/500 [07:31<01:36, 1.23s/it] 84%|████████▍ | 422/500 [07:32<01:39, 1.28s/it] {'loss': 0.0, 'learning_rate': 1.56e-05, 'epoch': 0.03} 84%|████████▍ | 422/500 [07:32<01:39, 1.28s/it] 85%|████████▍ | 423/500 [07:34<01:41, 1.32s/it] {'loss': 0.0, 'learning_rate': 1.54e-05, 'epoch': 0.03} 85%|████████▍ | 423/500 [07:34<01:41, 1.32s/it] 85%|████████▍ | 424/500 [07:35<01:42, 1.34s/it] {'loss': 0.0, 'learning_rate': 1.52e-05, 'epoch': 0.03} 85%|████████▍ | 424/500 [07:35<01:42, 1.34s/it] 85%|████████▌ | 425/500 [07:37<01:43, 1.37s/it] {'loss': 0.0, 'learning_rate': 1.5e-05, 'epoch': 0.03} 85%|████████▌ | 425/500 [07:37<01:43, 1.37s/it] 85%|████████▌ | 426/500 [07:38<01:42, 1.39s/it] {'loss': 0.0, 'learning_rate': 1.48e-05, 'epoch': 0.03} 85%|████████▌ | 426/500 [07:38<01:42, 1.39s/it] 85%|████████▌ | 427/500 [07:40<01:40, 1.38s/it] {'loss': 0.0, 'learning_rate': 1.4599999999999999e-05, 'epoch': 0.03} 85%|████████▌ | 427/500 [07:40<01:40, 1.38s/it] 86%|████████▌ | 428/500 [07:40<01:28, 1.23s/it] {'loss': 0.0, 'learning_rate': 1.44e-05, 'epoch': 0.03} 86%|████████▌ | 428/500 [07:40<01:28, 1.23s/it] 86%|████████▌ | 429/500 [07:41<01:20, 1.13s/it] {'loss': 0.0, 'learning_rate': 1.42e-05, 'epoch': 0.03} 86%|████████▌ | 429/500 [07:41<01:20, 1.13s/it] 86%|████████▌ | 430/500 [07:42<01:14, 1.06s/it] {'loss': 0.0, 'learning_rate': 1.4000000000000001e-05, 'epoch': 0.03} 86%|████████▌ | 430/500 [07:42<01:14, 1.06s/it] 86%|████████▌ | 431/500 [07:43<01:08, 1.00it/s] {'loss': 0.0, 'learning_rate': 1.3800000000000002e-05, 'epoch': 0.03} 86%|████████▌ | 431/500 [07:43<01:08, 1.00it/s] 86%|████████▋ | 432/500 [07:44<01:05, 1.04it/s] {'loss': 0.0, 'learning_rate': 1.3600000000000002e-05, 'epoch': 0.03} 86%|████████▋ | 432/500 [07:44<01:05, 1.04it/s] 87%|████████▋ | 433/500 [07:45<01:02, 1.07it/s] {'loss': 0.0, 'learning_rate': 1.3400000000000002e-05, 'epoch': 0.03} 87%|████████▋ | 433/500 [07:45<01:02, 1.07it/s] 87%|████████▋ | 434/500 [07:46<01:00, 1.09it/s] {'loss': 0.0, 'learning_rate': 1.32e-05, 'epoch': 0.03} 87%|████████▋ | 434/500 [07:46<01:00, 1.09it/s] 87%|████████▋ | 435/500 [07:47<00:58, 1.11it/s] {'loss': 0.0, 'learning_rate': 1.3000000000000001e-05, 'epoch': 0.03} 87%|████████▋ | 435/500 [07:47<00:58, 1.11it/s] 87%|████████▋ | 436/500 [07:47<00:54, 1.17it/s] {'loss': 0.0, 'learning_rate': 1.2800000000000001e-05, 'epoch': 0.03} 87%|████████▋ | 436/500 [07:47<00:54, 1.17it/s] 87%|████████▋ | 437/500 [07:48<00:49, 1.28it/s] {'loss': 0.0, 'learning_rate': 1.2600000000000001e-05, 'epoch': 0.03} 87%|████████▋ | 437/500 [07:48<00:49, 1.28it/s] 88%|████████▊ | 438/500 [07:48<00:44, 1.40it/s] {'loss': 0.0, 'learning_rate': 1.24e-05, 'epoch': 0.03} 88%|████████▊ | 438/500 [07:48<00:44, 1.40it/s] 88%|████████▊ | 439/500 [07:49<00:41, 1.46it/s] {'loss': 0.0, 'learning_rate': 1.22e-05, 'epoch': 0.03} 88%|████████▊ | 439/500 [07:49<00:41, 1.46it/s] 88%|████████▊ | 440/500 [07:50<00:39, 1.53it/s] {'loss': 0.0, 'learning_rate': 1.2e-05, 'epoch': 0.03} 88%|████████▊ | 440/500 [07:50<00:39, 1.53it/s] 88%|████████▊ | 441/500 [07:50<00:37, 1.56it/s] {'loss': 0.0, 'learning_rate': 1.18e-05, 'epoch': 0.03} 88%|████████▊ | 441/500 [07:50<00:37, 1.56it/s] 88%|████████▊ | 442/500 [07:51<00:44, 1.31it/s] {'loss': 0.0, 'learning_rate': 1.16e-05, 'epoch': 0.03} 88%|████████▊ | 442/500 [07:51<00:44, 1.31it/s] 89%|████████▊ | 443/500 [07:53<00:54, 1.05it/s] {'loss': 0.0, 'learning_rate': 1.1400000000000001e-05, 'epoch': 0.03} 89%|████████▊ | 443/500 [07:53<00:54, 1.05it/s] 89%|████████▉ | 444/500 [07:54<01:00, 1.07s/it] {'loss': 0.0, 'learning_rate': 1.1200000000000001e-05, 'epoch': 0.03} 89%|████████▉ | 444/500 [07:54<01:00, 1.07s/it] 89%|████████▉ | 445/500 [07:55<01:03, 1.16s/it] {'loss': 0.0, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.03} 89%|████████▉ | 445/500 [07:55<01:03, 1.16s/it] 89%|████████▉ | 446/500 [07:56<00:59, 1.10s/it] {'loss': 0.0, 'learning_rate': 1.08e-05, 'epoch': 0.03} 89%|████████▉ | 446/500 [07:56<00:59, 1.10s/it] 89%|████████▉ | 447/500 [07:57<00:55, 1.04s/it] {'loss': 0.0, 'learning_rate': 1.06e-05, 'epoch': 0.03} 89%|████████▉ | 447/500 [07:57<00:55, 1.04s/it] 90%|████████▉ | 448/500 [07:58<00:51, 1.00it/s] {'loss': 0.0, 'learning_rate': 1.04e-05, 'epoch': 0.03} 90%|████████▉ | 448/500 [07:58<00:51, 1.00it/s] 90%|████████▉ | 449/500 [07:59<00:46, 1.09it/s] {'loss': 0.0, 'learning_rate': 1.02e-05, 'epoch': 0.03} 90%|████████▉ | 449/500 [07:59<00:46, 1.09it/s] 90%|█████████ | 450/500 [08:00<00:45, 1.10it/s] {'loss': 0.0, 'learning_rate': 1e-05, 'epoch': 0.03} 90%|█████████ | 450/500 [08:00<00:45, 1.10it/s][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:35:22,531 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-450/tokenizer_config.json [INFO|tokenization_utils_base.py:2446] 2023-12-10 15:35:22,531 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-450/special_tokens_map.json 90%|█████████ | 451/500 [08:01<00:44, 1.10it/s] {'loss': 0.0, 'learning_rate': 9.800000000000001e-06, 'epoch': 0.03} 90%|█████████ | 451/500 [08:01<00:44, 1.10it/s] 90%|█████████ | 452/500 [08:02<00:42, 1.12it/s] {'loss': 0.0, 'learning_rate': 9.600000000000001e-06, 'epoch': 0.03} 90%|█████████ | 452/500 [08:02<00:42, 1.12it/s] 91%|█████████ | 453/500 [08:02<00:41, 1.13it/s] {'loss': 0.0, 'learning_rate': 9.4e-06, 'epoch': 0.03} 91%|█████████ | 453/500 [08:02<00:41, 1.13it/s] 91%|█████████ | 454/500 [08:03<00:40, 1.13it/s] {'loss': 0.0, 'learning_rate': 9.2e-06, 'epoch': 0.03} 91%|█████████ | 454/500 [08:03<00:40, 1.13it/s] 91%|█████████ | 455/500 [08:04<00:38, 1.17it/s] {'loss': 0.0, 'learning_rate': 9e-06, 'epoch': 0.03} 91%|█████████ | 455/500 [08:04<00:38, 1.17it/s] 91%|█████████ | 456/500 [08:05<00:34, 1.26it/s] {'loss': 0.0, 'learning_rate': 8.8e-06, 'epoch': 0.04} 91%|█████████ | 456/500 [08:05<00:34, 1.26it/s] 91%|█████████▏| 457/500 [08:05<00:31, 1.35it/s] {'loss': 0.0, 'learning_rate': 8.599999999999999e-06, 'epoch': 0.04} 91%|█████████▏| 457/500 [08:05<00:31, 1.35it/s] 92%|█████████▏| 458/500 [08:06<00:29, 1.40it/s] {'loss': 0.0, 'learning_rate': 8.400000000000001e-06, 'epoch': 0.04} 92%|█████████▏| 458/500 [08:06<00:29, 1.40it/s] 92%|█████████▏| 459/500 [08:07<00:27, 1.49it/s] {'loss': 0.0, 'learning_rate': 8.200000000000001e-06, 'epoch': 0.04} 92%|█████████▏| 459/500 [08:07<00:27, 1.49it/s] 92%|█████████▏| 460/500 [08:07<00:26, 1.51it/s] {'loss': 0.0, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.04} 92%|█████████▏| 460/500 [08:07<00:26, 1.51it/s] 92%|█████████▏| 461/500 [08:08<00:25, 1.51it/s] {'loss': 0.0, 'learning_rate': 7.8e-06, 'epoch': 0.04} 92%|█████████▏| 461/500 [08:08<00:25, 1.51it/s] 92%|█████████▏| 462/500 [08:08<00:24, 1.58it/s] {'loss': 0.0, 'learning_rate': 7.6e-06, 'epoch': 0.04} 92%|█████████▏| 462/500 [08:08<00:24, 1.58it/s] 93%|█████████▎| 463/500 [08:09<00:24, 1.52it/s] {'loss': 0.0, 'learning_rate': 7.4e-06, 'epoch': 0.04} 93%|█████████▎| 463/500 [08:09<00:24, 1.52it/s] 93%|█████████▎| 464/500 [08:10<00:22, 1.63it/s] {'loss': 0.0, 'learning_rate': 7.2e-06, 'epoch': 0.04} 93%|█████████▎| 464/500 [08:10<00:22, 1.63it/s] 93%|█████████▎| 465/500 [08:10<00:21, 1.64it/s] {'loss': 0.0, 'learning_rate': 7.000000000000001e-06, 'epoch': 0.04} 93%|█████████▎| 465/500 [08:10<00:21, 1.64it/s] 93%|█████████▎| 466/500 [08:11<00:21, 1.60it/s] {'loss': 0.0, 'learning_rate': 6.800000000000001e-06, 'epoch': 0.04} 93%|█████████▎| 466/500 [08:11<00:21, 1.60it/s] 93%|█████████▎| 467/500 [08:12<00:21, 1.56it/s] {'loss': 0.0, 'learning_rate': 6.6e-06, 'epoch': 0.04} 93%|█████████▎| 467/500 [08:12<00:21, 1.56it/s] 94%|█████████▎| 468/500 [08:13<00:25, 1.26it/s] {'loss': 0.0, 'learning_rate': 6.4000000000000006e-06, 'epoch': 0.04} 94%|█████████▎| 468/500 [08:13<00:25, 1.26it/s] 94%|█████████▍| 469/500 [08:14<00:29, 1.04it/s] {'loss': 0.0, 'learning_rate': 6.2e-06, 'epoch': 0.04} 94%|█████████▍| 469/500 [08:14<00:29, 1.04it/s] 94%|█████████▍| 470/500 [08:16<00:33, 1.12s/it] {'loss': 0.0, 'learning_rate': 6e-06, 'epoch': 0.04} 94%|█████████▍| 470/500 [08:16<00:33, 1.12s/it] 94%|█████████▍| 471/500 [08:17<00:35, 1.21s/it] {'loss': 0.0, 'learning_rate': 5.8e-06, 'epoch': 0.04} 94%|█████████▍| 471/500 [08:17<00:35, 1.21s/it] 94%|█████████▍| 472/500 [08:18<00:35, 1.25s/it] {'loss': 0.0, 'learning_rate': 5.600000000000001e-06, 'epoch': 0.04} 94%|█████████▍| 472/500 [08:18<00:35, 1.25s/it] 95%|█████████▍| 473/500 [08:20<00:34, 1.29s/it] {'loss': 0.0, 'learning_rate': 5.4e-06, 'epoch': 0.04} 95%|█████████▍| 473/500 [08:20<00:34, 1.29s/it] 95%|█████████▍| 474/500 [08:21<00:34, 1.33s/it] {'loss': 0.0, 'learning_rate': 5.2e-06, 'epoch': 0.04} 95%|█████████▍| 474/500 [08:21<00:34, 1.33s/it] 95%|█████████▌| 475/500 [08:23<00:33, 1.35s/it] {'loss': 0.0, 'learning_rate': 5e-06, 'epoch': 0.04} 95%|█████████▌| 475/500 [08:23<00:33, 1.35s/it] 95%|█████████▌| 476/500 [08:24<00:32, 1.34s/it] {'loss': 0.0, 'learning_rate': 4.800000000000001e-06, 'epoch': 0.04} 95%|█████████▌| 476/500 [08:24<00:32, 1.34s/it] 95%|█████████▌| 477/500 [08:25<00:31, 1.36s/it] {'loss': 0.0, 'learning_rate': 4.6e-06, 'epoch': 0.04} 95%|█████████▌| 477/500 [08:25<00:31, 1.36s/it] 96%|█████████▌| 478/500 [08:27<00:30, 1.37s/it] {'loss': 0.0, 'learning_rate': 4.4e-06, 'epoch': 0.04} 96%|█████████▌| 478/500 [08:27<00:30, 1.37s/it] 96%|█████████▌| 479/500 [08:28<00:28, 1.37s/it] {'loss': 0.0, 'learning_rate': 4.2000000000000004e-06, 'epoch': 0.04} 96%|█████████▌| 479/500 [08:28<00:28, 1.37s/it] 96%|█████████▌| 480/500 [08:30<00:28, 1.41s/it] {'loss': 0.0, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.04} 96%|█████████▌| 480/500 [08:30<00:28, 1.41s/it] 96%|█████████▌| 481/500 [08:31<00:26, 1.41s/it] {'loss': 0.0, 'learning_rate': 3.8e-06, 'epoch': 0.04} 96%|█████████▌| 481/500 [08:31<00:26, 1.41s/it] 96%|█████████▋| 482/500 [08:32<00:25, 1.41s/it] {'loss': 0.0, 'learning_rate': 3.6e-06, 'epoch': 0.04} 96%|█████████▋| 482/500 [08:32<00:25, 1.41s/it] 97%|█████████▋| 483/500 [08:34<00:24, 1.42s/it] {'loss': 0.0, 'learning_rate': 3.4000000000000005e-06, 'epoch': 0.04} 97%|█████████▋| 483/500 [08:34<00:24, 1.42s/it] 97%|█████████▋| 484/500 [08:35<00:22, 1.43s/it] {'loss': 0.0, 'learning_rate': 3.2000000000000003e-06, 'epoch': 0.04} 97%|█████████▋| 484/500 [08:35<00:22, 1.43s/it] 97%|█████████▋| 485/500 [08:37<00:21, 1.44s/it] {'loss': 0.0, 'learning_rate': 3e-06, 'epoch': 0.04} 97%|█████████▋| 485/500 [08:37<00:21, 1.44s/it] 97%|█████████▋| 486/500 [08:38<00:20, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.8000000000000003e-06, 'epoch': 0.04} 97%|█████████▋| 486/500 [08:38<00:20, 1.43s/it] 97%|█████████▋| 487/500 [08:40<00:18, 1.44s/it] {'loss': 0.0, 'learning_rate': 2.6e-06, 'epoch': 0.04} 97%|█████████▋| 487/500 [08:40<00:18, 1.44s/it] 98%|█████████▊| 488/500 [08:41<00:17, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.4000000000000003e-06, 'epoch': 0.04} 98%|█████████▊| 488/500 [08:41<00:17, 1.43s/it] 98%|█████████▊| 489/500 [08:42<00:15, 1.42s/it] {'loss': 0.0, 'learning_rate': 2.2e-06, 'epoch': 0.04} 98%|█████████▊| 489/500 [08:42<00:15, 1.42s/it] 98%|█████████▊| 490/500 [08:44<00:14, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.04} 98%|█████████▊| 490/500 [08:44<00:14, 1.43s/it] 98%|█████████▊| 491/500 [08:45<00:12, 1.44s/it] {'loss': 0.0, 'learning_rate': 1.8e-06, 'epoch': 0.04} 98%|█████████▊| 491/500 [08:45<00:12, 1.44s/it] 98%|█████████▊| 492/500 [08:47<00:11, 1.45s/it] {'loss': 0.0, 'learning_rate': 1.6000000000000001e-06, 'epoch': 0.04} 98%|█████████▊| 492/500 [08:47<00:11, 1.45s/it] 99%|█████████▊| 493/500 [08:48<00:09, 1.39s/it] {'loss': 0.0, 'learning_rate': 1.4000000000000001e-06, 'epoch': 0.04} 99%|█████████▊| 493/500 [08:48<00:09, 1.39s/it] 99%|█████████▉| 494/500 [08:49<00:07, 1.25s/it] {'loss': 0.0, 'learning_rate': 1.2000000000000002e-06, 'epoch': 0.04} 99%|█████████▉| 494/500 [08:49<00:07, 1.25s/it] 99%|█████████▉| 495/500 [08:50<00:05, 1.15s/it] {'loss': 0.0, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.04} 99%|█████████▉| 495/500 [08:50<00:05, 1.15s/it] 99%|█████████▉| 496/500 [08:51<00:04, 1.08s/it] {'loss': 0.0, 'learning_rate': 8.000000000000001e-07, 'epoch': 0.04} 99%|█████████▉| 496/500 [08:51<00:04, 1.08s/it] 99%|█████████▉| 497/500 [08:52<00:03, 1.03s/it] {'loss': 0.0, 'learning_rate': 6.000000000000001e-07, 'epoch': 0.04} 99%|█████████▉| 497/500 [08:52<00:03, 1.03s/it] 100%|█████████▉| 498/500 [08:53<00:01, 1.01it/s] {'loss': 0.0, 'learning_rate': 4.0000000000000003e-07, 'epoch': 0.04} 100%|█████████▉| 498/500 [08:53<00:01, 1.01it/s] 100%|█████████▉| 499/500 [08:54<00:00, 1.04it/s] {'loss': 0.0, 'learning_rate': 2.0000000000000002e-07, 'epoch': 0.04} 100%|█████████▉| 499/500 [08:54<00:00, 1.04it/s] 100%|██████████| 500/500 [08:54<00:00, 1.13it/s] {'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.04} 100%|██████████| 500/500 [08:54<00:00, 1.13it/s][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:36:16,994 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-500/tokenizer_config.json [INFO|tokenization_utils_base.py:2446] 2023-12-10 15:36:16,994 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-500/special_tokens_map.json [INFO|trainer.py:2017] 2023-12-10 15:36:17,040 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 536.6749, 'train_samples_per_second': 3.727, 'train_steps_per_second': 0.932, 'train_loss': 0.002908447265625, 'epoch': 0.04} 100%|██████████| 500/500 [08:54<00:00, 1.13it/s] 100%|██████████| 500/500 [08:54<00:00, 1.07s/it] [INFO|tokenization_utils_base.py:2437] 2023-12-10 15:36:17,061 >> tokenizer config file saved in output/text-20231210-152648-1e-4/tokenizer_config.json [INFO|tokenization_utils_base.py:2446] 2023-12-10 15:36:17,061 >> Special tokens file saved in output/text-20231210-152648-1e-4/special_tokens_map.json