seamoon2333's picture
Upload 7 files
7b8b68f
[2023-12-10 15:26:50,373] torch.distributed.run: [WARNING] master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
[2023-12-10 15:26:50,373] torch.distributed.run: [WARNING]
[2023-12-10 15:26:50,373] torch.distributed.run: [WARNING] *****************************************
[2023-12-10 15:26:50,373] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2023-12-10 15:26:50,373] torch.distributed.run: [WARNING] *****************************************
12/10/2023 15:26:54 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
12/10/2023 15:26:54 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=2,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0001,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=output/text-20231210-152648-1e-4/runs/Dec10_15-26-53_lily-gpu07,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=1.0,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=500,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=output/text-20231210-152648-1e-4,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=1,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=output/text-20231210-152648-1e-4,
save_on_each_node=False,
save_safetensors=False,
save_steps=50,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
[INFO|tokenization_utils_base.py:2043] 2023-12-10 15:26:55,236 >> loading file tokenizer.model from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/tokenizer.model
[INFO|tokenization_utils_base.py:2043] 2023-12-10 15:26:55,236 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2043] 2023-12-10 15:26:55,236 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:2043] 2023-12-10 15:26:55,236 >> loading file tokenizer_config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/tokenizer_config.json
[INFO|tokenization_utils_base.py:2043] 2023-12-10 15:26:55,236 >> loading file tokenizer.json from cache at None
[INFO|configuration_utils.py:715] 2023-12-10 15:26:55,589 >> loading configuration file config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/config.json
[INFO|configuration_utils.py:715] 2023-12-10 15:26:55,852 >> loading configuration file config.json from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/config.json
[INFO|configuration_utils.py:775] 2023-12-10 15:26:55,853 >> Model config ChatGLMConfig {
"_name_or_path": "THUDM/chatglm3-6b-base",
"add_bias_linear": false,
"add_qkv_bias": true,
"apply_query_key_layer_scaling": true,
"apply_residual_connection_post_layernorm": false,
"architectures": [
"ChatGLMModel"
],
"attention_dropout": 0.0,
"attention_softmax_in_fp32": true,
"auto_map": {
"AutoConfig": "THUDM/chatglm3-6b-base--configuration_chatglm.ChatGLMConfig",
"AutoModel": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForCausalLM": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSequenceClassification": "THUDM/chatglm3-6b-base--modeling_chatglm.ChatGLMForSequenceClassification"
},
"bias_dropout_fusion": true,
"classifier_dropout": null,
"eos_token_id": 2,
"ffn_hidden_size": 13696,
"fp32_residual_connection": false,
"hidden_dropout": 0.0,
"hidden_size": 4096,
"kv_channels": 128,
"layernorm_epsilon": 1e-05,
"model_type": "chatglm",
"multi_query_attention": true,
"multi_query_group_num": 2,
"num_attention_heads": 32,
"num_layers": 28,
"original_rope": true,
"pad_token_id": 0,
"padded_vocab_size": 65024,
"post_layer_norm": true,
"pre_seq_len": null,
"prefix_projection": false,
"quantization_bit": 0,
"rmsnorm": true,
"seq_length": 32768,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.34.0",
"use_cache": true,
"vocab_size": 65024
}
12/10/2023 15:26:55 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False
[INFO|modeling_utils.py:2993] 2023-12-10 15:26:56,183 >> loading weights file pytorch_model.bin from cache at /home/haiyue/.cache/huggingface/hub/models--THUDM--chatglm3-6b-base/snapshots/f91a1de587fdc692073367198e65369669a0b49d/pytorch_model.bin.index.json
[INFO|configuration_utils.py:770] 2023-12-10 15:26:56,185 >> Generate config GenerationConfig {
"eos_token_id": 2,
"pad_token_id": 0
}
Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s] Loading checkpoint shards: 14%|β–ˆβ– | 1/7 [00:01<00:10, 1.83s/it] Loading checkpoint shards: 14%|β–ˆβ– | 1/7 [00:01<00:11, 1.85s/it] Loading checkpoint shards: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:03<00:09, 1.83s/it] Loading checkpoint shards: 29%|β–ˆβ–ˆβ–Š | 2/7 [00:03<00:09, 1.91s/it] Loading checkpoint shards: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:05<00:07, 1.98s/it] Loading checkpoint shards: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 3/7 [00:05<00:07, 1.99s/it] Loading checkpoint shards: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:07<00:05, 1.98s/it] Loading checkpoint shards: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4/7 [00:07<00:05, 1.95s/it] Loading checkpoint shards: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:10<00:04, 2.08s/it] Loading checkpoint shards: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5/7 [00:09<00:03, 1.93s/it] Loading checkpoint shards: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:12<00:02, 2.07s/it] Loading checkpoint shards: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 6/7 [00:11<00:01, 1.93s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:13<00:00, 1.79s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:13<00:00, 1.90s/it]
[INFO|modeling_utils.py:3775] 2023-12-10 15:27:09,563 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.
[INFO|modeling_utils.py:3783] 2023-12-10 15:27:09,564 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at THUDM/chatglm3-6b-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|modeling_utils.py:3352] 2023-12-10 15:27:09,818 >> Generation config file not found, using a generation config created from the model config.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:12<00:00, 1.64s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:12<00:00, 1.81s/it]
Train dataset size: 52002
Sanity Check >>>>>>>>>>>>>
'[gMASK]': 64790 -> -100
'sop': 64792 -> -100
'Instruction': 29101 -> -100
':': 30954 -> -100
'Give': 10465 -> -100
'three': 1194 -> -100
'tips': 6639 -> -100
'for': 332 -> -100
'staying': 10061 -> -100
'healthy': 4651 -> -100
'.': 30930 -> -100
'\n': 13 -> -100
'An': 4244 -> -100
'sw': 1902 -> -100
'er': 266 -> -100
':': 30954 -> -100
'': 30910 -> -100
'': 30910 -> 30910
'1': 30939 -> 30939
'.': 30930 -> 30930
'E': 30950 -> 30950
'at': 269 -> 269
'a': 260 -> 260
'balanced': 12949 -> 12949
'diet': 5546 -> 5546
'and': 293 -> 293
'make': 794 -> 794
'sure': 1506 -> 1506
'to': 289 -> 289
'include': 1860 -> 1860
'plenty': 5765 -> 5765
'of': 290 -> 290
'fruits': 13665 -> 13665
'and': 293 -> 293
'vegetables': 11567 -> 11567
'.': 30930 -> 30930
'': 30910 -> 30910
'\n': 13 -> 13
'2': 30943 -> 30943
'.': 30930 -> 30930
'Exercise': 23340 -> 23340
'regularly': 7414 -> 7414
'to': 289 -> 289
'keep': 1407 -> 1407
'your': 475 -> 475
'body': 1934 -> 1934
'active': 4047 -> 4047
'and': 293 -> 293
'strong': 2034 -> 2034
'.': 30930 -> 30930
'': 30910 -> 30910
'\n': 13 -> 13
'3': 30966 -> 30966
'.': 30930 -> 30930
'Get': 3286 -> 3286
'enough': 1775 -> 1775
'sleep': 4039 -> 4039
'and': 293 -> 293
'maintain': 3165 -> 3165
'a': 260 -> 260
'consistent': 7096 -> 7096
'sleep': 4039 -> 4039
'schedule': 5821 -> 5821
'.': 30930 -> 30930
'': 2 -> 2
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
<<<<<<<<<<<<< Sanity Check
Train dataset size: 52002
Sanity Check >>>>>>>>>>>>>
'[gMASK]': 64790 -> -100
'sop': 64792 -> -100
'Instruction': 29101 -> -100
':': 30954 -> -100
'Give': 10465 -> -100
'three': 1194 -> -100
'tips': 6639 -> -100
'for': 332 -> -100
'staying': 10061 -> -100
'healthy': 4651 -> -100
'.': 30930 -> -100
'\n': 13 -> -100
'An': 4244 -> -100
'sw': 1902 -> -100
'er': 266 -> -100
':': 30954 -> -100
'': 30910 -> -100
'': 30910 -> 30910
'1': 30939 -> 30939
'.': 30930 -> 30930
'E': 30950 -> 30950
'at': 269 -> 269
'a': 260 -> 260
'balanced': 12949 -> 12949
'diet': 5546 -> 5546
'and': 293 -> 293
'make': 794 -> 794
'sure': 1506 -> 1506
'to': 289 -> 289
'include': 1860 -> 1860
'plenty': 5765 -> 5765
'of': 290 -> 290
'fruits': 13665 -> 13665
'and': 293 -> 293
'vegetables': 11567 -> 11567
'.': 30930 -> 30930
'': 30910 -> 30910
'\n': 13 -> 13
'2': 30943 -> 30943
'.': 30930 -> 30930
'Exercise': 23340 -> 23340
'regularly': 7414 -> 7414
'to': 289 -> 289
'keep': 1407 -> 1407
'your': 475 -> 475
'body': 1934 -> 1934
'active': 4047 -> 4047
'and': 293 -> 293
'strong': 2034 -> 2034
'.': 30930 -> 30930
'': 30910 -> 30910
'\n': 13 -> 13
'3': 30966 -> 30966
'.': 30930 -> 30930
'Get': 3286 -> 3286
'enough': 1775 -> 1775
'sleep': 4039 -> 4039
'and': 293 -> 293
'maintain': 3165 -> 3165
'a': 260 -> 260
'consistent': 7096 -> 7096
'sleep': 4039 -> 4039
'schedule': 5821 -> 5821
'.': 30930 -> 30930
'': 2 -> 2
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
'': 0 -> -100
<<<<<<<<<<<<< Sanity Check
[INFO|trainer.py:576] 2023-12-10 15:27:18,453 >> max_steps is given, it will override any value given in num_train_epochs
[INFO|trainer.py:1760] 2023-12-10 15:27:20,364 >> ***** Running training *****
[INFO|trainer.py:1761] 2023-12-10 15:27:20,364 >> Num examples = 52,002
[INFO|trainer.py:1762] 2023-12-10 15:27:20,364 >> Num Epochs = 1
[INFO|trainer.py:1763] 2023-12-10 15:27:20,364 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1766] 2023-12-10 15:27:20,364 >> Total train batch size (w. parallel, distributed & accumulation) = 4
[INFO|trainer.py:1767] 2023-12-10 15:27:20,365 >> Gradient Accumulation steps = 2
[INFO|trainer.py:1768] 2023-12-10 15:27:20,365 >> Total optimization steps = 500
[INFO|trainer.py:1769] 2023-12-10 15:27:20,366 >> Number of trainable parameters = 1,949,696
0%| | 0/500 [00:00<?, ?it/s][W reducer.cpp:1346] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1346] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
0%| | 1/500 [00:02<22:59, 2.76s/it] {'loss': 1.4542, 'learning_rate': 9.98e-05, 'epoch': 0.0}
0%| | 1/500 [00:02<22:59, 2.76s/it] 0%| | 2/500 [00:03<12:28, 1.50s/it] {'loss': 0.0, 'learning_rate': 9.960000000000001e-05, 'epoch': 0.0}
0%| | 2/500 [00:03<12:28, 1.50s/it] 1%| | 3/500 [00:04<09:28, 1.14s/it] {'loss': 0.0, 'learning_rate': 9.94e-05, 'epoch': 0.0}
1%| | 3/500 [00:04<09:28, 1.14s/it] 1%| | 4/500 [00:04<07:36, 1.09it/s] {'loss': 0.0, 'learning_rate': 9.92e-05, 'epoch': 0.0}
1%| | 4/500 [00:04<07:36, 1.09it/s] 1%| | 5/500 [00:05<06:33, 1.26it/s] {'loss': 0.0, 'learning_rate': 9.900000000000001e-05, 'epoch': 0.0}
1%| | 5/500 [00:05<06:33, 1.26it/s] 1%| | 6/500 [00:05<06:17, 1.31it/s] {'loss': 0.0, 'learning_rate': 9.88e-05, 'epoch': 0.0}
1%| | 6/500 [00:05<06:17, 1.31it/s] 1%|▏ | 7/500 [00:06<05:57, 1.38it/s] {'loss': 0.0, 'learning_rate': 9.86e-05, 'epoch': 0.0}
1%|▏ | 7/500 [00:06<05:57, 1.38it/s] 2%|▏ | 8/500 [00:07<06:11, 1.32it/s] {'loss': 0.0, 'learning_rate': 9.84e-05, 'epoch': 0.0}
2%|▏ | 8/500 [00:07<06:11, 1.32it/s] 2%|▏ | 9/500 [00:08<05:58, 1.37it/s] {'loss': 0.0, 'learning_rate': 9.82e-05, 'epoch': 0.0}
2%|▏ | 9/500 [00:08<05:58, 1.37it/s] 2%|▏ | 10/500 [00:08<05:45, 1.42it/s] {'loss': 0.0, 'learning_rate': 9.8e-05, 'epoch': 0.0}
2%|▏ | 10/500 [00:08<05:45, 1.42it/s] 2%|▏ | 11/500 [00:09<06:06, 1.33it/s] {'loss': 0.0, 'learning_rate': 9.78e-05, 'epoch': 0.0}
2%|▏ | 11/500 [00:09<06:06, 1.33it/s] 2%|▏ | 12/500 [00:10<06:39, 1.22it/s] {'loss': 0.0, 'learning_rate': 9.76e-05, 'epoch': 0.0}
2%|▏ | 12/500 [00:10<06:39, 1.22it/s] 3%|β–Ž | 13/500 [00:11<07:58, 1.02it/s] {'loss': 0.0, 'learning_rate': 9.74e-05, 'epoch': 0.0}
3%|β–Ž | 13/500 [00:11<07:58, 1.02it/s] 3%|β–Ž | 14/500 [00:13<08:59, 1.11s/it] {'loss': 0.0, 'learning_rate': 9.72e-05, 'epoch': 0.0}
3%|β–Ž | 14/500 [00:13<08:59, 1.11s/it] 3%|β–Ž | 15/500 [00:14<09:43, 1.20s/it] {'loss': 0.0, 'learning_rate': 9.7e-05, 'epoch': 0.0}
3%|β–Ž | 15/500 [00:14<09:43, 1.20s/it] 3%|β–Ž | 16/500 [00:16<10:14, 1.27s/it] {'loss': 0.0, 'learning_rate': 9.680000000000001e-05, 'epoch': 0.0}
3%|β–Ž | 16/500 [00:16<10:14, 1.27s/it] 3%|β–Ž | 17/500 [00:17<10:37, 1.32s/it] {'loss': 0.0, 'learning_rate': 9.66e-05, 'epoch': 0.0}
3%|β–Ž | 17/500 [00:17<10:37, 1.32s/it] 4%|β–Ž | 18/500 [00:19<10:49, 1.35s/it] {'loss': 0.0, 'learning_rate': 9.64e-05, 'epoch': 0.0}
4%|β–Ž | 18/500 [00:19<10:49, 1.35s/it] 4%|▍ | 19/500 [00:20<10:59, 1.37s/it] {'loss': 0.0, 'learning_rate': 9.620000000000001e-05, 'epoch': 0.0}
4%|▍ | 19/500 [00:20<10:59, 1.37s/it] 4%|▍ | 20/500 [00:21<11:04, 1.38s/it] {'loss': 0.0, 'learning_rate': 9.6e-05, 'epoch': 0.0}
4%|▍ | 20/500 [00:21<11:04, 1.38s/it] 4%|▍ | 21/500 [00:23<11:02, 1.38s/it] {'loss': 0.0, 'learning_rate': 9.58e-05, 'epoch': 0.0}
4%|▍ | 21/500 [00:23<11:02, 1.38s/it] 4%|▍ | 22/500 [00:24<11:10, 1.40s/it] {'loss': 0.0, 'learning_rate': 9.56e-05, 'epoch': 0.0}
4%|▍ | 22/500 [00:24<11:10, 1.40s/it] 5%|▍ | 23/500 [00:26<11:08, 1.40s/it] {'loss': 0.0, 'learning_rate': 9.54e-05, 'epoch': 0.0}
5%|▍ | 23/500 [00:26<11:08, 1.40s/it] 5%|▍ | 24/500 [00:27<11:12, 1.41s/it] {'loss': 0.0, 'learning_rate': 9.52e-05, 'epoch': 0.0}
5%|▍ | 24/500 [00:27<11:12, 1.41s/it] 5%|β–Œ | 25/500 [00:28<11:06, 1.40s/it] {'loss': 0.0, 'learning_rate': 9.5e-05, 'epoch': 0.0}
5%|β–Œ | 25/500 [00:28<11:06, 1.40s/it] 5%|β–Œ | 26/500 [00:30<11:10, 1.42s/it] {'loss': 0.0, 'learning_rate': 9.48e-05, 'epoch': 0.0}
5%|β–Œ | 26/500 [00:30<11:10, 1.42s/it] 5%|β–Œ | 27/500 [00:31<11:08, 1.41s/it] {'loss': 0.0, 'learning_rate': 9.46e-05, 'epoch': 0.0}
5%|β–Œ | 27/500 [00:31<11:08, 1.41s/it] 6%|β–Œ | 28/500 [00:33<11:06, 1.41s/it] {'loss': 0.0, 'learning_rate': 9.44e-05, 'epoch': 0.0}
6%|β–Œ | 28/500 [00:33<11:06, 1.41s/it] 6%|β–Œ | 29/500 [00:34<11:09, 1.42s/it] {'loss': 0.0, 'learning_rate': 9.42e-05, 'epoch': 0.0}
6%|β–Œ | 29/500 [00:34<11:09, 1.42s/it] 6%|β–Œ | 30/500 [00:36<11:04, 1.41s/it] {'loss': 0.0, 'learning_rate': 9.4e-05, 'epoch': 0.0}
6%|β–Œ | 30/500 [00:36<11:04, 1.41s/it] 6%|β–Œ | 31/500 [00:37<10:57, 1.40s/it] {'loss': 0.0, 'learning_rate': 9.38e-05, 'epoch': 0.0}
6%|β–Œ | 31/500 [00:37<10:57, 1.40s/it] 6%|β–‹ | 32/500 [00:38<10:53, 1.40s/it] {'loss': 0.0, 'learning_rate': 9.360000000000001e-05, 'epoch': 0.0}
6%|β–‹ | 32/500 [00:38<10:53, 1.40s/it] 7%|β–‹ | 33/500 [00:40<10:48, 1.39s/it] {'loss': 0.0, 'learning_rate': 9.340000000000001e-05, 'epoch': 0.0}
7%|β–‹ | 33/500 [00:40<10:48, 1.39s/it] 7%|β–‹ | 34/500 [00:41<10:57, 1.41s/it] {'loss': 0.0, 'learning_rate': 9.320000000000002e-05, 'epoch': 0.0}
7%|β–‹ | 34/500 [00:41<10:57, 1.41s/it] 7%|β–‹ | 35/500 [00:43<10:56, 1.41s/it] {'loss': 0.0, 'learning_rate': 9.300000000000001e-05, 'epoch': 0.0}
7%|β–‹ | 35/500 [00:43<10:56, 1.41s/it] 7%|β–‹ | 36/500 [00:44<10:55, 1.41s/it] {'loss': 0.0, 'learning_rate': 9.28e-05, 'epoch': 0.0}
7%|β–‹ | 36/500 [00:44<10:55, 1.41s/it] 7%|β–‹ | 37/500 [00:45<10:56, 1.42s/it] {'loss': 0.0, 'learning_rate': 9.260000000000001e-05, 'epoch': 0.0}
7%|β–‹ | 37/500 [00:45<10:56, 1.42s/it] 8%|β–Š | 38/500 [00:47<10:53, 1.42s/it] {'loss': 0.0, 'learning_rate': 9.240000000000001e-05, 'epoch': 0.0}
8%|β–Š | 38/500 [00:47<10:53, 1.42s/it] 8%|β–Š | 39/500 [00:48<10:54, 1.42s/it] {'loss': 0.0, 'learning_rate': 9.22e-05, 'epoch': 0.0}
8%|β–Š | 39/500 [00:48<10:54, 1.42s/it] 8%|β–Š | 40/500 [00:50<10:56, 1.43s/it] {'loss': 0.0, 'learning_rate': 9.200000000000001e-05, 'epoch': 0.0}
8%|β–Š | 40/500 [00:50<10:56, 1.43s/it] 8%|β–Š | 41/500 [00:51<10:54, 1.43s/it] {'loss': 0.0, 'learning_rate': 9.180000000000001e-05, 'epoch': 0.0}
8%|β–Š | 41/500 [00:51<10:54, 1.43s/it] 8%|β–Š | 42/500 [00:53<10:53, 1.43s/it] {'loss': 0.0, 'learning_rate': 9.16e-05, 'epoch': 0.0}
8%|β–Š | 42/500 [00:53<10:53, 1.43s/it] 9%|β–Š | 43/500 [00:54<10:50, 1.42s/it] {'loss': 0.0, 'learning_rate': 9.140000000000001e-05, 'epoch': 0.0}
9%|β–Š | 43/500 [00:54<10:50, 1.42s/it] 9%|β–‰ | 44/500 [00:55<10:53, 1.43s/it] {'loss': 0.0, 'learning_rate': 9.120000000000001e-05, 'epoch': 0.0}
9%|β–‰ | 44/500 [00:55<10:53, 1.43s/it] 9%|β–‰ | 45/500 [00:57<10:53, 1.44s/it] {'loss': 0.0, 'learning_rate': 9.1e-05, 'epoch': 0.0}
9%|β–‰ | 45/500 [00:57<10:53, 1.44s/it] 9%|β–‰ | 46/500 [00:58<10:57, 1.45s/it] {'loss': 0.0, 'learning_rate': 9.080000000000001e-05, 'epoch': 0.0}
9%|β–‰ | 46/500 [00:58<10:57, 1.45s/it] 9%|β–‰ | 47/500 [01:00<10:59, 1.46s/it] {'loss': 0.0, 'learning_rate': 9.06e-05, 'epoch': 0.0}
9%|β–‰ | 47/500 [01:00<10:59, 1.46s/it] 10%|β–‰ | 48/500 [01:01<10:55, 1.45s/it] {'loss': 0.0, 'learning_rate': 9.04e-05, 'epoch': 0.0}
10%|β–‰ | 48/500 [01:01<10:55, 1.45s/it] 10%|β–‰ | 49/500 [01:03<10:57, 1.46s/it] {'loss': 0.0, 'learning_rate': 9.020000000000001e-05, 'epoch': 0.0}
10%|β–‰ | 49/500 [01:03<10:57, 1.46s/it] 10%|β–ˆ | 50/500 [01:04<10:51, 1.45s/it] {'loss': 0.0, 'learning_rate': 9e-05, 'epoch': 0.0}
10%|β–ˆ | 50/500 [01:04<10:51, 1.45s/it][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:28:26,862 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-50/tokenizer_config.json
[INFO|tokenization_utils_base.py:2446] 2023-12-10 15:28:26,862 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-50/special_tokens_map.json
10%|β–ˆ | 51/500 [01:06<11:00, 1.47s/it] {'loss': 0.0, 'learning_rate': 8.98e-05, 'epoch': 0.0}
10%|β–ˆ | 51/500 [01:06<11:00, 1.47s/it] 10%|β–ˆ | 52/500 [01:07<11:01, 1.48s/it] {'loss': 0.0, 'learning_rate': 8.960000000000001e-05, 'epoch': 0.0}
10%|β–ˆ | 52/500 [01:07<11:01, 1.48s/it] 11%|β–ˆ | 53/500 [01:09<10:57, 1.47s/it] {'loss': 0.0, 'learning_rate': 8.94e-05, 'epoch': 0.0}
11%|β–ˆ | 53/500 [01:09<10:57, 1.47s/it] 11%|β–ˆ | 54/500 [01:10<10:57, 1.47s/it] {'loss': 0.0, 'learning_rate': 8.92e-05, 'epoch': 0.0}
11%|β–ˆ | 54/500 [01:10<10:57, 1.47s/it] 11%|β–ˆ | 55/500 [01:12<10:52, 1.47s/it] {'loss': 0.0, 'learning_rate': 8.900000000000001e-05, 'epoch': 0.0}
11%|β–ˆ | 55/500 [01:12<10:52, 1.47s/it] 11%|β–ˆ | 56/500 [01:13<10:48, 1.46s/it] {'loss': 0.0, 'learning_rate': 8.88e-05, 'epoch': 0.0}
11%|β–ˆ | 56/500 [01:13<10:48, 1.46s/it] 11%|β–ˆβ– | 57/500 [01:14<10:42, 1.45s/it] {'loss': 0.0, 'learning_rate': 8.86e-05, 'epoch': 0.0}
11%|β–ˆβ– | 57/500 [01:14<10:42, 1.45s/it] 12%|β–ˆβ– | 58/500 [01:16<10:20, 1.40s/it] {'loss': 0.0, 'learning_rate': 8.840000000000001e-05, 'epoch': 0.0}
12%|β–ˆβ– | 58/500 [01:16<10:20, 1.40s/it] 12%|β–ˆβ– | 59/500 [01:17<09:11, 1.25s/it] {'loss': 0.0, 'learning_rate': 8.82e-05, 'epoch': 0.0}
12%|β–ˆβ– | 59/500 [01:17<09:11, 1.25s/it] 12%|β–ˆβ– | 60/500 [01:17<08:25, 1.15s/it] {'loss': 0.0, 'learning_rate': 8.800000000000001e-05, 'epoch': 0.0}
12%|β–ˆβ– | 60/500 [01:17<08:25, 1.15s/it] 12%|β–ˆβ– | 61/500 [01:18<07:51, 1.07s/it] {'loss': 0.0, 'learning_rate': 8.78e-05, 'epoch': 0.0}
12%|β–ˆβ– | 61/500 [01:18<07:51, 1.07s/it] 12%|β–ˆβ– | 62/500 [01:19<07:29, 1.03s/it] {'loss': 0.0, 'learning_rate': 8.76e-05, 'epoch': 0.0}
12%|β–ˆβ– | 62/500 [01:19<07:29, 1.03s/it] 13%|β–ˆβ–Ž | 63/500 [01:20<07:11, 1.01it/s] {'loss': 0.0, 'learning_rate': 8.740000000000001e-05, 'epoch': 0.0}
13%|β–ˆβ–Ž | 63/500 [01:20<07:11, 1.01it/s] 13%|β–ˆβ–Ž | 64/500 [01:21<06:57, 1.05it/s] {'loss': 0.0, 'learning_rate': 8.72e-05, 'epoch': 0.0}
13%|β–ˆβ–Ž | 64/500 [01:21<06:57, 1.05it/s] 13%|β–ˆβ–Ž | 65/500 [01:22<06:52, 1.05it/s] {'loss': 0.0, 'learning_rate': 8.7e-05, 'epoch': 0.0}
13%|β–ˆβ–Ž | 65/500 [01:22<06:52, 1.05it/s] 13%|β–ˆβ–Ž | 66/500 [01:23<07:08, 1.01it/s] {'loss': 0.0, 'learning_rate': 8.680000000000001e-05, 'epoch': 0.01}
13%|β–ˆβ–Ž | 66/500 [01:23<07:08, 1.01it/s] 13%|β–ˆβ–Ž | 67/500 [01:24<07:43, 1.07s/it] {'loss': 0.0, 'learning_rate': 8.66e-05, 'epoch': 0.01}
13%|β–ˆβ–Ž | 67/500 [01:24<07:43, 1.07s/it] 14%|β–ˆβ–Ž | 68/500 [01:26<08:16, 1.15s/it] {'loss': 0.0, 'learning_rate': 8.64e-05, 'epoch': 0.01}
14%|β–ˆβ–Ž | 68/500 [01:26<08:16, 1.15s/it] 14%|β–ˆβ– | 69/500 [01:27<08:38, 1.20s/it] {'loss': 0.0, 'learning_rate': 8.620000000000001e-05, 'epoch': 0.01}
14%|β–ˆβ– | 69/500 [01:27<08:38, 1.20s/it] 14%|β–ˆβ– | 70/500 [01:28<08:32, 1.19s/it] {'loss': 0.0, 'learning_rate': 8.6e-05, 'epoch': 0.01}
14%|β–ˆβ– | 70/500 [01:28<08:32, 1.19s/it] 14%|β–ˆβ– | 71/500 [01:29<07:31, 1.05s/it] {'loss': 0.0, 'learning_rate': 8.58e-05, 'epoch': 0.01}
14%|β–ˆβ– | 71/500 [01:29<07:31, 1.05s/it] 14%|β–ˆβ– | 72/500 [01:29<06:12, 1.15it/s] {'loss': 0.0, 'learning_rate': 8.560000000000001e-05, 'epoch': 0.01}
14%|β–ˆβ– | 72/500 [01:29<06:12, 1.15it/s] 15%|β–ˆβ– | 73/500 [01:30<05:17, 1.34it/s] {'loss': 0.0, 'learning_rate': 8.54e-05, 'epoch': 0.01}
15%|β–ˆβ– | 73/500 [01:30<05:17, 1.34it/s] 15%|β–ˆβ– | 74/500 [01:30<04:39, 1.53it/s] {'loss': 0.0, 'learning_rate': 8.52e-05, 'epoch': 0.01}
15%|β–ˆβ– | 74/500 [01:30<04:39, 1.53it/s] 15%|β–ˆβ–Œ | 75/500 [01:31<04:12, 1.68it/s] {'loss': 0.0, 'learning_rate': 8.5e-05, 'epoch': 0.01}
15%|β–ˆβ–Œ | 75/500 [01:31<04:12, 1.68it/s] 15%|β–ˆβ–Œ | 76/500 [01:31<03:53, 1.82it/s] {'loss': 0.0, 'learning_rate': 8.48e-05, 'epoch': 0.01}
15%|β–ˆβ–Œ | 76/500 [01:31<03:53, 1.82it/s] 15%|β–ˆβ–Œ | 77/500 [01:32<03:40, 1.92it/s] {'loss': 0.0, 'learning_rate': 8.46e-05, 'epoch': 0.01}
15%|β–ˆβ–Œ | 77/500 [01:32<03:40, 1.92it/s] 16%|β–ˆβ–Œ | 78/500 [01:32<03:30, 2.00it/s] {'loss': 0.0, 'learning_rate': 8.44e-05, 'epoch': 0.01}
16%|β–ˆβ–Œ | 78/500 [01:32<03:30, 2.00it/s] 16%|β–ˆβ–Œ | 79/500 [01:33<03:23, 2.07it/s] {'loss': 0.0, 'learning_rate': 8.42e-05, 'epoch': 0.01}
16%|β–ˆβ–Œ | 79/500 [01:33<03:23, 2.07it/s] 16%|β–ˆβ–Œ | 80/500 [01:33<03:20, 2.10it/s] {'loss': 0.0, 'learning_rate': 8.4e-05, 'epoch': 0.01}
16%|β–ˆβ–Œ | 80/500 [01:33<03:20, 2.10it/s] 16%|β–ˆβ–Œ | 81/500 [01:33<03:17, 2.12it/s] {'loss': 0.0, 'learning_rate': 8.38e-05, 'epoch': 0.01}
16%|β–ˆβ–Œ | 81/500 [01:33<03:17, 2.12it/s] 16%|β–ˆβ–‹ | 82/500 [01:34<03:15, 2.14it/s] {'loss': 0.0, 'learning_rate': 8.36e-05, 'epoch': 0.01}
16%|β–ˆβ–‹ | 82/500 [01:34<03:15, 2.14it/s] 17%|β–ˆβ–‹ | 83/500 [01:34<03:14, 2.15it/s] {'loss': 0.0, 'learning_rate': 8.34e-05, 'epoch': 0.01}
17%|β–ˆβ–‹ | 83/500 [01:34<03:14, 2.15it/s] 17%|β–ˆβ–‹ | 84/500 [01:35<03:13, 2.15it/s] {'loss': 0.0, 'learning_rate': 8.32e-05, 'epoch': 0.01}
17%|β–ˆβ–‹ | 84/500 [01:35<03:13, 2.15it/s] 17%|β–ˆβ–‹ | 85/500 [01:35<03:12, 2.16it/s] {'loss': 0.0, 'learning_rate': 8.3e-05, 'epoch': 0.01}
17%|β–ˆβ–‹ | 85/500 [01:35<03:12, 2.16it/s] 17%|β–ˆβ–‹ | 86/500 [01:36<03:12, 2.16it/s] {'loss': 0.0, 'learning_rate': 8.28e-05, 'epoch': 0.01}
17%|β–ˆβ–‹ | 86/500 [01:36<03:12, 2.16it/s] 17%|β–ˆβ–‹ | 87/500 [01:36<03:10, 2.16it/s] {'loss': 0.0, 'learning_rate': 8.26e-05, 'epoch': 0.01}
17%|β–ˆβ–‹ | 87/500 [01:36<03:10, 2.16it/s] 18%|β–ˆβ–Š | 88/500 [01:37<03:10, 2.16it/s] {'loss': 0.0, 'learning_rate': 8.24e-05, 'epoch': 0.01}
18%|β–ˆβ–Š | 88/500 [01:37<03:10, 2.16it/s] 18%|β–ˆβ–Š | 89/500 [01:37<03:09, 2.17it/s] {'loss': 0.0, 'learning_rate': 8.22e-05, 'epoch': 0.01}
18%|β–ˆβ–Š | 89/500 [01:37<03:09, 2.17it/s] 18%|β–ˆβ–Š | 90/500 [01:38<03:09, 2.17it/s] {'loss': 0.0, 'learning_rate': 8.2e-05, 'epoch': 0.01}
18%|β–ˆβ–Š | 90/500 [01:38<03:09, 2.17it/s] 18%|β–ˆβ–Š | 91/500 [01:38<03:13, 2.11it/s] {'loss': 0.0, 'learning_rate': 8.18e-05, 'epoch': 0.01}
18%|β–ˆβ–Š | 91/500 [01:38<03:13, 2.11it/s] 18%|β–ˆβ–Š | 92/500 [01:39<03:14, 2.10it/s] {'loss': 0.0, 'learning_rate': 8.16e-05, 'epoch': 0.01}
18%|β–ˆβ–Š | 92/500 [01:39<03:14, 2.10it/s] 19%|β–ˆβ–Š | 93/500 [01:39<03:18, 2.05it/s] {'loss': 0.0, 'learning_rate': 8.14e-05, 'epoch': 0.01}
19%|β–ˆβ–Š | 93/500 [01:39<03:18, 2.05it/s] 19%|β–ˆβ–‰ | 94/500 [01:40<03:16, 2.07it/s] {'loss': 0.0, 'learning_rate': 8.120000000000001e-05, 'epoch': 0.01}
19%|β–ˆβ–‰ | 94/500 [01:40<03:16, 2.07it/s] 19%|β–ˆβ–‰ | 95/500 [01:40<03:12, 2.10it/s] {'loss': 0.0, 'learning_rate': 8.1e-05, 'epoch': 0.01}
19%|β–ˆβ–‰ | 95/500 [01:40<03:12, 2.10it/s] 19%|β–ˆβ–‰ | 96/500 [01:41<03:19, 2.02it/s] {'loss': 0.0, 'learning_rate': 8.080000000000001e-05, 'epoch': 0.01}
19%|β–ˆβ–‰ | 96/500 [01:41<03:19, 2.02it/s] 19%|β–ˆβ–‰ | 97/500 [01:42<04:17, 1.57it/s] {'loss': 0.0, 'learning_rate': 8.060000000000001e-05, 'epoch': 0.01}
19%|β–ˆβ–‰ | 97/500 [01:42<04:17, 1.57it/s] 20%|β–ˆβ–‰ | 98/500 [01:43<05:47, 1.16it/s] {'loss': 0.0, 'learning_rate': 8.04e-05, 'epoch': 0.01}
20%|β–ˆβ–‰ | 98/500 [01:43<05:47, 1.16it/s] 20%|β–ˆβ–‰ | 99/500 [01:44<06:58, 1.04s/it] {'loss': 0.0, 'learning_rate': 8.020000000000001e-05, 'epoch': 0.01}
20%|β–ˆβ–‰ | 99/500 [01:44<06:58, 1.04s/it] 20%|β–ˆβ–ˆ | 100/500 [01:46<07:36, 1.14s/it] {'loss': 0.0, 'learning_rate': 8e-05, 'epoch': 0.01}
20%|β–ˆβ–ˆ | 100/500 [01:46<07:36, 1.14s/it][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:29:08,477 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-100/tokenizer_config.json
[INFO|tokenization_utils_base.py:2446] 2023-12-10 15:29:08,478 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-100/special_tokens_map.json
20%|β–ˆβ–ˆ | 101/500 [01:47<07:13, 1.09s/it] {'loss': 0.0, 'learning_rate': 7.98e-05, 'epoch': 0.01}
20%|β–ˆβ–ˆ | 101/500 [01:47<07:13, 1.09s/it] 20%|β–ˆβ–ˆ | 102/500 [01:48<06:56, 1.05s/it] {'loss': 0.0, 'learning_rate': 7.960000000000001e-05, 'epoch': 0.01}
20%|β–ˆβ–ˆ | 102/500 [01:48<06:56, 1.05s/it] 21%|β–ˆβ–ˆ | 103/500 [01:49<06:42, 1.01s/it] {'loss': 0.0, 'learning_rate': 7.94e-05, 'epoch': 0.01}
21%|β–ˆβ–ˆ | 103/500 [01:49<06:42, 1.01s/it] 21%|β–ˆβ–ˆ | 104/500 [01:50<06:31, 1.01it/s] {'loss': 0.0, 'learning_rate': 7.920000000000001e-05, 'epoch': 0.01}
21%|β–ˆβ–ˆ | 104/500 [01:50<06:31, 1.01it/s] 21%|β–ˆβ–ˆ | 105/500 [01:50<06:20, 1.04it/s] {'loss': 0.0, 'learning_rate': 7.900000000000001e-05, 'epoch': 0.01}
21%|β–ˆβ–ˆ | 105/500 [01:50<06:20, 1.04it/s] 21%|β–ˆβ–ˆ | 106/500 [01:51<05:52, 1.12it/s] {'loss': 0.0, 'learning_rate': 7.88e-05, 'epoch': 0.01}
21%|β–ˆβ–ˆ | 106/500 [01:51<05:52, 1.12it/s] 21%|β–ˆβ–ˆβ– | 107/500 [01:52<05:32, 1.18it/s] {'loss': 0.0, 'learning_rate': 7.860000000000001e-05, 'epoch': 0.01}
21%|β–ˆβ–ˆβ– | 107/500 [01:52<05:32, 1.18it/s] 22%|β–ˆβ–ˆβ– | 108/500 [01:53<05:36, 1.16it/s] {'loss': 0.0, 'learning_rate': 7.840000000000001e-05, 'epoch': 0.01}
22%|β–ˆβ–ˆβ– | 108/500 [01:53<05:36, 1.16it/s] 22%|β–ˆβ–ˆβ– | 109/500 [01:54<05:25, 1.20it/s] {'loss': 0.0, 'learning_rate': 7.82e-05, 'epoch': 0.01}
22%|β–ˆβ–ˆβ– | 109/500 [01:54<05:25, 1.20it/s] 22%|β–ˆβ–ˆβ– | 110/500 [01:54<05:10, 1.26it/s] {'loss': 0.0, 'learning_rate': 7.800000000000001e-05, 'epoch': 0.01}
22%|β–ˆβ–ˆβ– | 110/500 [01:54<05:10, 1.26it/s] 22%|β–ˆβ–ˆβ– | 111/500 [01:55<05:12, 1.25it/s] {'loss': 0.0, 'learning_rate': 7.780000000000001e-05, 'epoch': 0.01}
22%|β–ˆβ–ˆβ– | 111/500 [01:55<05:12, 1.25it/s] 22%|β–ˆβ–ˆβ– | 112/500 [01:56<05:02, 1.28it/s] {'loss': 0.0, 'learning_rate': 7.76e-05, 'epoch': 0.01}
22%|β–ˆβ–ˆβ– | 112/500 [01:56<05:02, 1.28it/s] 23%|β–ˆβ–ˆβ–Ž | 113/500 [01:57<04:59, 1.29it/s] {'loss': 0.0, 'learning_rate': 7.740000000000001e-05, 'epoch': 0.01}
23%|β–ˆβ–ˆβ–Ž | 113/500 [01:57<04:59, 1.29it/s] 23%|β–ˆβ–ˆβ–Ž | 114/500 [01:57<04:59, 1.29it/s] {'loss': 0.0, 'learning_rate': 7.72e-05, 'epoch': 0.01}
23%|β–ˆβ–ˆβ–Ž | 114/500 [01:57<04:59, 1.29it/s] 23%|β–ˆβ–ˆβ–Ž | 115/500 [01:58<05:12, 1.23it/s] {'loss': 0.0, 'learning_rate': 7.7e-05, 'epoch': 0.01}
23%|β–ˆβ–ˆβ–Ž | 115/500 [01:58<05:12, 1.23it/s] 23%|β–ˆβ–ˆβ–Ž | 116/500 [01:59<05:44, 1.11it/s] {'loss': 0.0, 'learning_rate': 7.680000000000001e-05, 'epoch': 0.01}
23%|β–ˆβ–ˆβ–Ž | 116/500 [01:59<05:44, 1.11it/s] 23%|β–ˆβ–ˆβ–Ž | 117/500 [02:01<06:36, 1.03s/it] {'loss': 0.0, 'learning_rate': 7.66e-05, 'epoch': 0.01}
23%|β–ˆβ–ˆβ–Ž | 117/500 [02:01<06:36, 1.03s/it] 24%|β–ˆβ–ˆβ–Ž | 118/500 [02:02<07:16, 1.14s/it] {'loss': 0.0, 'learning_rate': 7.64e-05, 'epoch': 0.01}
24%|β–ˆβ–ˆβ–Ž | 118/500 [02:02<07:16, 1.14s/it] 24%|β–ˆβ–ˆβ– | 119/500 [02:03<07:46, 1.22s/it] {'loss': 0.0, 'learning_rate': 7.620000000000001e-05, 'epoch': 0.01}
24%|β–ˆβ–ˆβ– | 119/500 [02:03<07:46, 1.22s/it] 24%|β–ˆβ–ˆβ– | 120/500 [02:05<08:08, 1.29s/it] {'loss': 0.0, 'learning_rate': 7.6e-05, 'epoch': 0.01}
24%|β–ˆβ–ˆβ– | 120/500 [02:05<08:08, 1.29s/it] 24%|β–ˆβ–ˆβ– | 121/500 [02:06<08:16, 1.31s/it] {'loss': 0.0, 'learning_rate': 7.58e-05, 'epoch': 0.01}
24%|β–ˆβ–ˆβ– | 121/500 [02:06<08:16, 1.31s/it] 24%|β–ˆβ–ˆβ– | 122/500 [02:08<08:28, 1.35s/it] {'loss': 0.0, 'learning_rate': 7.560000000000001e-05, 'epoch': 0.01}
24%|β–ˆβ–ˆβ– | 122/500 [02:08<08:28, 1.35s/it] 25%|β–ˆβ–ˆβ– | 123/500 [02:09<08:34, 1.36s/it] {'loss': 0.0, 'learning_rate': 7.54e-05, 'epoch': 0.01}
25%|β–ˆβ–ˆβ– | 123/500 [02:09<08:34, 1.36s/it] 25%|β–ˆβ–ˆβ– | 124/500 [02:11<08:42, 1.39s/it] {'loss': 0.0, 'learning_rate': 7.52e-05, 'epoch': 0.01}
25%|β–ˆβ–ˆβ– | 124/500 [02:11<08:42, 1.39s/it] 25%|β–ˆβ–ˆβ–Œ | 125/500 [02:12<08:43, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.500000000000001e-05, 'epoch': 0.01}
25%|β–ˆβ–ˆβ–Œ | 125/500 [02:12<08:43, 1.40s/it] 25%|β–ˆβ–ˆβ–Œ | 126/500 [02:13<08:43, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.48e-05, 'epoch': 0.01}
25%|β–ˆβ–ˆβ–Œ | 126/500 [02:13<08:43, 1.40s/it] 25%|β–ˆβ–ˆβ–Œ | 127/500 [02:15<08:47, 1.42s/it] {'loss': 0.0, 'learning_rate': 7.46e-05, 'epoch': 0.01}
25%|β–ˆβ–ˆβ–Œ | 127/500 [02:15<08:47, 1.42s/it] 26%|β–ˆβ–ˆβ–Œ | 128/500 [02:16<08:46, 1.42s/it] {'loss': 0.0, 'learning_rate': 7.44e-05, 'epoch': 0.01}
26%|β–ˆβ–ˆβ–Œ | 128/500 [02:16<08:46, 1.42s/it] 26%|β–ˆβ–ˆβ–Œ | 129/500 [02:18<08:45, 1.42s/it] {'loss': 0.0, 'learning_rate': 7.42e-05, 'epoch': 0.01}
26%|β–ˆβ–ˆβ–Œ | 129/500 [02:18<08:45, 1.42s/it] 26%|β–ˆβ–ˆβ–Œ | 130/500 [02:19<08:42, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.4e-05, 'epoch': 0.01}
26%|β–ˆβ–ˆβ–Œ | 130/500 [02:19<08:42, 1.41s/it] 26%|β–ˆβ–ˆβ–Œ | 131/500 [02:20<08:38, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.38e-05, 'epoch': 0.01}
26%|β–ˆβ–ˆβ–Œ | 131/500 [02:20<08:38, 1.41s/it] 26%|β–ˆβ–ˆβ–‹ | 132/500 [02:22<08:35, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.36e-05, 'epoch': 0.01}
26%|β–ˆβ–ˆβ–‹ | 132/500 [02:22<08:35, 1.40s/it] 27%|β–ˆβ–ˆβ–‹ | 133/500 [02:23<08:37, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.340000000000001e-05, 'epoch': 0.01}
27%|β–ˆβ–ˆβ–‹ | 133/500 [02:23<08:37, 1.41s/it] 27%|β–ˆβ–ˆβ–‹ | 134/500 [02:25<08:30, 1.39s/it] {'loss': 0.0, 'learning_rate': 7.32e-05, 'epoch': 0.01}
27%|β–ˆβ–ˆβ–‹ | 134/500 [02:25<08:30, 1.39s/it] 27%|β–ˆβ–ˆβ–‹ | 135/500 [02:26<08:32, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.3e-05, 'epoch': 0.01}
27%|β–ˆβ–ˆβ–‹ | 135/500 [02:26<08:32, 1.40s/it] 27%|β–ˆβ–ˆβ–‹ | 136/500 [02:28<08:35, 1.42s/it] {'loss': 0.0, 'learning_rate': 7.280000000000001e-05, 'epoch': 0.01}
27%|β–ˆβ–ˆβ–‹ | 136/500 [02:28<08:35, 1.42s/it] 27%|β–ˆβ–ˆβ–‹ | 137/500 [02:29<08:33, 1.42s/it] {'loss': 0.0, 'learning_rate': 7.26e-05, 'epoch': 0.01}
27%|β–ˆβ–ˆβ–‹ | 137/500 [02:29<08:33, 1.42s/it] 28%|β–ˆβ–ˆβ–Š | 138/500 [02:30<08:29, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.24e-05, 'epoch': 0.01}
28%|β–ˆβ–ˆβ–Š | 138/500 [02:30<08:29, 1.41s/it] 28%|β–ˆβ–ˆβ–Š | 139/500 [02:32<08:22, 1.39s/it] {'loss': 0.0, 'learning_rate': 7.22e-05, 'epoch': 0.01}
28%|β–ˆβ–ˆβ–Š | 139/500 [02:32<08:22, 1.39s/it] 28%|β–ˆβ–ˆβ–Š | 140/500 [02:33<08:21, 1.39s/it] {'loss': 0.0, 'learning_rate': 7.2e-05, 'epoch': 0.01}
28%|β–ˆβ–ˆβ–Š | 140/500 [02:33<08:21, 1.39s/it] 28%|β–ˆβ–ˆβ–Š | 141/500 [02:34<08:20, 1.39s/it] {'loss': 0.0, 'learning_rate': 7.18e-05, 'epoch': 0.01}
28%|β–ˆβ–ˆβ–Š | 141/500 [02:34<08:20, 1.39s/it] 28%|β–ˆβ–ˆβ–Š | 142/500 [02:36<08:21, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.16e-05, 'epoch': 0.01}
28%|β–ˆβ–ˆβ–Š | 142/500 [02:36<08:21, 1.40s/it] 29%|β–ˆβ–ˆβ–Š | 143/500 [02:37<08:20, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.14e-05, 'epoch': 0.01}
29%|β–ˆβ–ˆβ–Š | 143/500 [02:37<08:20, 1.40s/it] 29%|β–ˆβ–ˆβ–‰ | 144/500 [02:39<08:20, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.12e-05, 'epoch': 0.01}
29%|β–ˆβ–ˆβ–‰ | 144/500 [02:39<08:20, 1.41s/it] 29%|β–ˆβ–ˆβ–‰ | 145/500 [02:40<08:17, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.1e-05, 'epoch': 0.01}
29%|β–ˆβ–ˆβ–‰ | 145/500 [02:40<08:17, 1.40s/it] 29%|β–ˆβ–ˆβ–‰ | 146/500 [02:41<08:15, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.08e-05, 'epoch': 0.01}
29%|β–ˆβ–ˆβ–‰ | 146/500 [02:42<08:15, 1.40s/it] 29%|β–ˆβ–ˆβ–‰ | 147/500 [02:43<08:15, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.06e-05, 'epoch': 0.01}
29%|β–ˆβ–ˆβ–‰ | 147/500 [02:43<08:15, 1.40s/it] 30%|β–ˆβ–ˆβ–‰ | 148/500 [02:44<08:12, 1.40s/it] {'loss': 0.0, 'learning_rate': 7.04e-05, 'epoch': 0.01}
30%|β–ˆβ–ˆβ–‰ | 148/500 [02:44<08:12, 1.40s/it] 30%|β–ˆβ–ˆβ–‰ | 149/500 [02:46<08:16, 1.41s/it] {'loss': 0.0, 'learning_rate': 7.02e-05, 'epoch': 0.01}
30%|β–ˆβ–ˆβ–‰ | 149/500 [02:46<08:16, 1.41s/it] 30%|β–ˆβ–ˆβ–ˆ | 150/500 [02:47<08:08, 1.40s/it] {'loss': 0.0, 'learning_rate': 7e-05, 'epoch': 0.01}
30%|β–ˆβ–ˆβ–ˆ | 150/500 [02:47<08:08, 1.40s/it][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:30:09,868 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-150/tokenizer_config.json
[INFO|tokenization_utils_base.py:2446] 2023-12-10 15:30:09,869 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-150/special_tokens_map.json
30%|β–ˆβ–ˆβ–ˆ | 151/500 [02:49<08:12, 1.41s/it] {'loss': 0.0, 'learning_rate': 6.98e-05, 'epoch': 0.01}
30%|β–ˆβ–ˆβ–ˆ | 151/500 [02:49<08:12, 1.41s/it] 30%|β–ˆβ–ˆβ–ˆ | 152/500 [02:50<08:13, 1.42s/it] {'loss': 0.0, 'learning_rate': 6.96e-05, 'epoch': 0.01}
30%|β–ˆβ–ˆβ–ˆ | 152/500 [02:50<08:13, 1.42s/it] 31%|β–ˆβ–ˆβ–ˆ | 153/500 [02:51<08:08, 1.41s/it] {'loss': 0.0, 'learning_rate': 6.939999999999999e-05, 'epoch': 0.01}
31%|β–ˆβ–ˆβ–ˆ | 153/500 [02:51<08:08, 1.41s/it] 31%|β–ˆβ–ˆβ–ˆ | 154/500 [02:53<08:01, 1.39s/it] {'loss': 0.0, 'learning_rate': 6.92e-05, 'epoch': 0.01}
31%|β–ˆβ–ˆβ–ˆ | 154/500 [02:53<08:01, 1.39s/it] 31%|β–ˆβ–ˆβ–ˆ | 155/500 [02:54<07:59, 1.39s/it] {'loss': 0.0, 'learning_rate': 6.9e-05, 'epoch': 0.01}
31%|β–ˆβ–ˆβ–ˆ | 155/500 [02:54<07:59, 1.39s/it] 31%|β–ˆβ–ˆβ–ˆ | 156/500 [02:56<08:01, 1.40s/it] {'loss': 0.0, 'learning_rate': 6.879999999999999e-05, 'epoch': 0.01}
31%|β–ˆβ–ˆβ–ˆ | 156/500 [02:56<08:01, 1.40s/it] 31%|β–ˆβ–ˆβ–ˆβ– | 157/500 [02:57<08:01, 1.40s/it] {'loss': 0.0, 'learning_rate': 6.860000000000001e-05, 'epoch': 0.01}
31%|β–ˆβ–ˆβ–ˆβ– | 157/500 [02:57<08:01, 1.40s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 158/500 [02:58<07:59, 1.40s/it] {'loss': 0.0, 'learning_rate': 6.840000000000001e-05, 'epoch': 0.01}
32%|β–ˆβ–ˆβ–ˆβ– | 158/500 [02:58<07:59, 1.40s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 159/500 [03:00<08:00, 1.41s/it] {'loss': 0.0, 'learning_rate': 6.82e-05, 'epoch': 0.01}
32%|β–ˆβ–ˆβ–ˆβ– | 159/500 [03:00<08:00, 1.41s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 160/500 [03:01<07:55, 1.40s/it] {'loss': 0.0, 'learning_rate': 6.800000000000001e-05, 'epoch': 0.01}
32%|β–ˆβ–ˆβ–ˆβ– | 160/500 [03:01<07:55, 1.40s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 161/500 [03:03<08:00, 1.42s/it] {'loss': 0.0, 'learning_rate': 6.780000000000001e-05, 'epoch': 0.01}
32%|β–ˆβ–ˆβ–ˆβ– | 161/500 [03:03<08:00, 1.42s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 162/500 [03:04<07:59, 1.42s/it] {'loss': 0.0, 'learning_rate': 6.76e-05, 'epoch': 0.01}
32%|β–ˆβ–ˆβ–ˆβ– | 162/500 [03:04<07:59, 1.42s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 163/500 [03:05<07:55, 1.41s/it] {'loss': 0.0, 'learning_rate': 6.740000000000001e-05, 'epoch': 0.01}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 163/500 [03:05<07:55, 1.41s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 164/500 [03:07<07:55, 1.42s/it] {'loss': 0.0, 'learning_rate': 6.720000000000001e-05, 'epoch': 0.01}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 164/500 [03:07<07:55, 1.42s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 165/500 [03:08<07:58, 1.43s/it] {'loss': 0.0, 'learning_rate': 6.7e-05, 'epoch': 0.01}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 165/500 [03:08<07:58, 1.43s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 166/500 [03:10<07:56, 1.43s/it] {'loss': 0.0, 'learning_rate': 6.680000000000001e-05, 'epoch': 0.01}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 166/500 [03:10<07:56, 1.43s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 167/500 [03:11<07:58, 1.44s/it] {'loss': 0.0, 'learning_rate': 6.66e-05, 'epoch': 0.01}
33%|β–ˆβ–ˆβ–ˆβ–Ž | 167/500 [03:11<07:58, 1.44s/it] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 168/500 [03:13<07:57, 1.44s/it] {'loss': 0.0, 'learning_rate': 6.64e-05, 'epoch': 0.01}
34%|β–ˆβ–ˆβ–ˆβ–Ž | 168/500 [03:13<07:57, 1.44s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 169/500 [03:14<07:57, 1.44s/it] {'loss': 0.0, 'learning_rate': 6.620000000000001e-05, 'epoch': 0.01}
34%|β–ˆβ–ˆβ–ˆβ– | 169/500 [03:14<07:57, 1.44s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 170/500 [03:16<08:00, 1.46s/it] {'loss': 0.0, 'learning_rate': 6.6e-05, 'epoch': 0.01}
34%|β–ˆβ–ˆβ–ˆβ– | 170/500 [03:16<08:00, 1.46s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 171/500 [03:17<07:57, 1.45s/it] {'loss': 0.0, 'learning_rate': 6.58e-05, 'epoch': 0.01}
34%|β–ˆβ–ˆβ–ˆβ– | 171/500 [03:17<07:57, 1.45s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 172/500 [03:18<07:55, 1.45s/it] {'loss': 0.0, 'learning_rate': 6.560000000000001e-05, 'epoch': 0.01}
34%|β–ˆβ–ˆβ–ˆβ– | 172/500 [03:18<07:55, 1.45s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 173/500 [03:20<07:57, 1.46s/it] {'loss': 0.0, 'learning_rate': 6.54e-05, 'epoch': 0.01}
35%|β–ˆβ–ˆβ–ˆβ– | 173/500 [03:20<07:57, 1.46s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 174/500 [03:21<07:48, 1.44s/it] {'loss': 0.0, 'learning_rate': 6.52e-05, 'epoch': 0.01}
35%|β–ˆβ–ˆβ–ˆβ– | 174/500 [03:21<07:48, 1.44s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 175/500 [03:22<06:59, 1.29s/it] {'loss': 0.0, 'learning_rate': 6.500000000000001e-05, 'epoch': 0.01}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 175/500 [03:22<06:59, 1.29s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 176/500 [03:23<06:21, 1.18s/it] {'loss': 0.0, 'learning_rate': 6.48e-05, 'epoch': 0.01}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 176/500 [03:23<06:21, 1.18s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 177/500 [03:24<05:43, 1.06s/it] {'loss': 0.0, 'learning_rate': 6.460000000000001e-05, 'epoch': 0.01}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 177/500 [03:24<05:43, 1.06s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 178/500 [03:24<04:43, 1.14it/s] {'loss': 0.0, 'learning_rate': 6.440000000000001e-05, 'epoch': 0.01}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 178/500 [03:24<04:43, 1.14it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 179/500 [03:25<04:00, 1.34it/s] {'loss': 0.0, 'learning_rate': 6.42e-05, 'epoch': 0.01}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 179/500 [03:25<04:00, 1.34it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 180/500 [03:25<03:30, 1.52it/s] {'loss': 0.0, 'learning_rate': 6.400000000000001e-05, 'epoch': 0.01}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 180/500 [03:25<03:30, 1.52it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 181/500 [03:26<03:09, 1.68it/s] {'loss': 0.0, 'learning_rate': 6.38e-05, 'epoch': 0.01}
36%|β–ˆβ–ˆβ–ˆβ–Œ | 181/500 [03:26<03:09, 1.68it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 182/500 [03:26<02:54, 1.82it/s] {'loss': 0.0, 'learning_rate': 6.36e-05, 'epoch': 0.01}
36%|β–ˆβ–ˆβ–ˆβ–‹ | 182/500 [03:26<02:54, 1.82it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 183/500 [03:27<02:44, 1.92it/s] {'loss': 0.0, 'learning_rate': 6.340000000000001e-05, 'epoch': 0.01}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 183/500 [03:27<02:44, 1.92it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 184/500 [03:27<02:37, 2.00it/s] {'loss': 0.0, 'learning_rate': 6.32e-05, 'epoch': 0.01}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 184/500 [03:27<02:37, 2.00it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 185/500 [03:28<02:33, 2.05it/s] {'loss': 0.0, 'learning_rate': 6.3e-05, 'epoch': 0.01}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 185/500 [03:28<02:33, 2.05it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 186/500 [03:28<02:30, 2.09it/s] {'loss': 0.0, 'learning_rate': 6.280000000000001e-05, 'epoch': 0.01}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 186/500 [03:28<02:30, 2.09it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 187/500 [03:28<02:27, 2.12it/s] {'loss': 0.0, 'learning_rate': 6.26e-05, 'epoch': 0.01}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 187/500 [03:28<02:27, 2.12it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 188/500 [03:29<02:25, 2.14it/s] {'loss': 0.0, 'learning_rate': 6.24e-05, 'epoch': 0.01}
38%|β–ˆβ–ˆβ–ˆβ–Š | 188/500 [03:29<02:25, 2.14it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 189/500 [03:29<02:24, 2.15it/s] {'loss': 0.0, 'learning_rate': 6.220000000000001e-05, 'epoch': 0.01}
38%|β–ˆβ–ˆβ–ˆβ–Š | 189/500 [03:29<02:24, 2.15it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 190/500 [03:30<02:23, 2.16it/s] {'loss': 0.0, 'learning_rate': 6.2e-05, 'epoch': 0.01}
38%|β–ˆβ–ˆβ–ˆβ–Š | 190/500 [03:30<02:23, 2.16it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 191/500 [03:30<02:27, 2.10it/s] {'loss': 0.0, 'learning_rate': 6.18e-05, 'epoch': 0.01}
38%|β–ˆβ–ˆβ–ˆβ–Š | 191/500 [03:30<02:27, 2.10it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 192/500 [03:31<02:42, 1.90it/s] {'loss': 0.0, 'learning_rate': 6.16e-05, 'epoch': 0.01}
38%|β–ˆβ–ˆβ–ˆβ–Š | 192/500 [03:31<02:42, 1.90it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 193/500 [03:32<03:04, 1.66it/s] {'loss': 0.0, 'learning_rate': 6.14e-05, 'epoch': 0.01}
39%|β–ˆβ–ˆβ–ˆβ–Š | 193/500 [03:32<03:04, 1.66it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 194/500 [03:33<03:20, 1.52it/s] {'loss': 0.0, 'learning_rate': 6.12e-05, 'epoch': 0.01}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 194/500 [03:33<03:20, 1.52it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 195/500 [03:34<04:13, 1.21it/s] {'loss': 0.0, 'learning_rate': 6.1e-05, 'epoch': 0.01}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 195/500 [03:34<04:13, 1.21it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 196/500 [03:35<04:31, 1.12it/s] {'loss': 0.0, 'learning_rate': 6.08e-05, 'epoch': 0.02}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 196/500 [03:35<04:31, 1.12it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 197/500 [03:36<04:32, 1.11it/s] {'loss': 0.0, 'learning_rate': 6.06e-05, 'epoch': 0.02}
39%|β–ˆβ–ˆβ–ˆβ–‰ | 197/500 [03:36<04:32, 1.11it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 198/500 [03:36<04:17, 1.17it/s] {'loss': 0.0, 'learning_rate': 6.04e-05, 'epoch': 0.02}
40%|β–ˆβ–ˆβ–ˆβ–‰ | 198/500 [03:36<04:17, 1.17it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 199/500 [03:37<03:39, 1.37it/s] {'loss': 0.0, 'learning_rate': 6.02e-05, 'epoch': 0.02}
40%|β–ˆβ–ˆβ–ˆβ–‰ | 199/500 [03:37<03:39, 1.37it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 200/500 [03:37<03:13, 1.55it/s] {'loss': 0.0, 'learning_rate': 6e-05, 'epoch': 0.02}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 200/500 [03:37<03:13, 1.55it/s][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:31:00,119 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-200/tokenizer_config.json
[INFO|tokenization_utils_base.py:2446] 2023-12-10 15:31:00,120 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-200/special_tokens_map.json
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 201/500 [03:38<02:58, 1.68it/s] {'loss': 0.0, 'learning_rate': 5.9800000000000003e-05, 'epoch': 0.02}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 201/500 [03:38<02:58, 1.68it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 202/500 [03:38<02:43, 1.82it/s] {'loss': 0.0, 'learning_rate': 5.96e-05, 'epoch': 0.02}
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 202/500 [03:38<02:43, 1.82it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 203/500 [03:39<02:33, 1.93it/s] {'loss': 0.0, 'learning_rate': 5.94e-05, 'epoch': 0.02}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 203/500 [03:39<02:33, 1.93it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 204/500 [03:39<02:26, 2.02it/s] {'loss': 0.0, 'learning_rate': 5.92e-05, 'epoch': 0.02}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 204/500 [03:39<02:26, 2.02it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 205/500 [03:40<02:21, 2.08it/s] {'loss': 0.0, 'learning_rate': 5.9e-05, 'epoch': 0.02}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 205/500 [03:40<02:21, 2.08it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 206/500 [03:40<02:18, 2.13it/s] {'loss': 0.0, 'learning_rate': 5.88e-05, 'epoch': 0.02}
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 206/500 [03:40<02:18, 2.13it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 207/500 [03:41<02:15, 2.16it/s] {'loss': 0.0, 'learning_rate': 5.86e-05, 'epoch': 0.02}
41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 207/500 [03:41<02:15, 2.16it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 208/500 [03:41<02:14, 2.17it/s] {'loss': 0.0, 'learning_rate': 5.8399999999999997e-05, 'epoch': 0.02}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 208/500 [03:41<02:14, 2.17it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 209/500 [03:41<02:13, 2.18it/s] {'loss': 0.0, 'learning_rate': 5.82e-05, 'epoch': 0.02}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 209/500 [03:41<02:13, 2.18it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 210/500 [03:42<02:13, 2.18it/s] {'loss': 0.0, 'learning_rate': 5.8e-05, 'epoch': 0.02}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 210/500 [03:42<02:13, 2.18it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 211/500 [03:42<02:12, 2.18it/s] {'loss': 0.0, 'learning_rate': 5.7799999999999995e-05, 'epoch': 0.02}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 211/500 [03:42<02:12, 2.18it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 212/500 [03:43<02:11, 2.18it/s] {'loss': 0.0, 'learning_rate': 5.76e-05, 'epoch': 0.02}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 212/500 [03:43<02:11, 2.18it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 213/500 [03:43<02:16, 2.10it/s] {'loss': 0.0, 'learning_rate': 5.74e-05, 'epoch': 0.02}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 213/500 [03:43<02:16, 2.10it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 214/500 [03:44<02:15, 2.10it/s] {'loss': 0.0, 'learning_rate': 5.72e-05, 'epoch': 0.02}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 214/500 [03:44<02:15, 2.10it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 215/500 [03:44<02:14, 2.11it/s] {'loss': 0.0, 'learning_rate': 5.6999999999999996e-05, 'epoch': 0.02}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 215/500 [03:44<02:14, 2.11it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 216/500 [03:45<02:13, 2.12it/s] {'loss': 0.0, 'learning_rate': 5.68e-05, 'epoch': 0.02}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 216/500 [03:45<02:13, 2.12it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 217/500 [03:45<02:15, 2.09it/s] {'loss': 0.0, 'learning_rate': 5.66e-05, 'epoch': 0.02}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 217/500 [03:45<02:15, 2.09it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 218/500 [03:46<02:17, 2.05it/s] {'loss': 0.0, 'learning_rate': 5.6399999999999995e-05, 'epoch': 0.02}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 218/500 [03:46<02:17, 2.05it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 219/500 [03:46<02:16, 2.07it/s] {'loss': 0.0, 'learning_rate': 5.620000000000001e-05, 'epoch': 0.02}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 219/500 [03:46<02:16, 2.07it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 220/500 [03:47<02:25, 1.92it/s] {'loss': 0.0, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.02}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 220/500 [03:47<02:25, 1.92it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 221/500 [03:47<02:33, 1.82it/s] {'loss': 0.0, 'learning_rate': 5.580000000000001e-05, 'epoch': 0.02}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 221/500 [03:47<02:33, 1.82it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 222/500 [03:48<02:29, 1.86it/s] {'loss': 0.0, 'learning_rate': 5.560000000000001e-05, 'epoch': 0.02}
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 222/500 [03:48<02:29, 1.86it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 223/500 [03:48<02:23, 1.93it/s] {'loss': 0.0, 'learning_rate': 5.5400000000000005e-05, 'epoch': 0.02}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 223/500 [03:48<02:23, 1.93it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 224/500 [03:49<02:27, 1.87it/s] {'loss': 0.0, 'learning_rate': 5.520000000000001e-05, 'epoch': 0.02}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 224/500 [03:49<02:27, 1.87it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 225/500 [03:50<02:28, 1.85it/s] {'loss': 0.0, 'learning_rate': 5.500000000000001e-05, 'epoch': 0.02}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 225/500 [03:50<02:28, 1.85it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 226/500 [03:50<02:28, 1.84it/s] {'loss': 0.0, 'learning_rate': 5.4800000000000004e-05, 'epoch': 0.02}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 226/500 [03:50<02:28, 1.84it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 227/500 [03:51<02:38, 1.72it/s] {'loss': 0.0, 'learning_rate': 5.4600000000000006e-05, 'epoch': 0.02}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 227/500 [03:51<02:38, 1.72it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 228/500 [03:51<02:33, 1.77it/s] {'loss': 0.0, 'learning_rate': 5.440000000000001e-05, 'epoch': 0.02}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 228/500 [03:51<02:33, 1.77it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 229/500 [03:52<02:31, 1.79it/s] {'loss': 0.0, 'learning_rate': 5.420000000000001e-05, 'epoch': 0.02}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 229/500 [03:52<02:31, 1.79it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 230/500 [03:52<02:34, 1.75it/s] {'loss': 0.0, 'learning_rate': 5.4000000000000005e-05, 'epoch': 0.02}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 230/500 [03:52<02:34, 1.75it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 231/500 [03:53<02:40, 1.68it/s] {'loss': 0.0, 'learning_rate': 5.380000000000001e-05, 'epoch': 0.02}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 231/500 [03:53<02:40, 1.68it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 232/500 [03:54<03:22, 1.32it/s] {'loss': 0.0, 'learning_rate': 5.360000000000001e-05, 'epoch': 0.02}
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 232/500 [03:54<03:22, 1.32it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 233/500 [03:56<04:05, 1.09it/s] {'loss': 0.0, 'learning_rate': 5.3400000000000004e-05, 'epoch': 0.02}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 233/500 [03:56<04:05, 1.09it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 234/500 [03:57<04:46, 1.08s/it] {'loss': 0.0, 'learning_rate': 5.3200000000000006e-05, 'epoch': 0.02}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 234/500 [03:57<04:46, 1.08s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 235/500 [03:58<05:07, 1.16s/it] {'loss': 0.0, 'learning_rate': 5.300000000000001e-05, 'epoch': 0.02}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 235/500 [03:58<05:07, 1.16s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 236/500 [04:00<05:26, 1.24s/it] {'loss': 0.0, 'learning_rate': 5.28e-05, 'epoch': 0.02}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 236/500 [04:00<05:26, 1.24s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 237/500 [04:01<05:33, 1.27s/it] {'loss': 0.0, 'learning_rate': 5.2600000000000005e-05, 'epoch': 0.02}
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 237/500 [04:01<05:33, 1.27s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 238/500 [04:03<05:45, 1.32s/it] {'loss': 0.0, 'learning_rate': 5.2400000000000007e-05, 'epoch': 0.02}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 238/500 [04:03<05:45, 1.32s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 239/500 [04:04<05:48, 1.34s/it] {'loss': 0.0, 'learning_rate': 5.22e-05, 'epoch': 0.02}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 239/500 [04:04<05:48, 1.34s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 240/500 [04:05<05:53, 1.36s/it] {'loss': 0.0, 'learning_rate': 5.2000000000000004e-05, 'epoch': 0.02}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 240/500 [04:05<05:53, 1.36s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 241/500 [04:07<05:56, 1.38s/it] {'loss': 0.0, 'learning_rate': 5.1800000000000005e-05, 'epoch': 0.02}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 241/500 [04:07<05:56, 1.38s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 242/500 [04:08<05:57, 1.38s/it] {'loss': 0.0, 'learning_rate': 5.16e-05, 'epoch': 0.02}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 242/500 [04:08<05:57, 1.38s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 243/500 [04:10<05:56, 1.39s/it] {'loss': 0.0, 'learning_rate': 5.14e-05, 'epoch': 0.02}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 243/500 [04:10<05:56, 1.39s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 244/500 [04:11<06:02, 1.42s/it] {'loss': 0.0, 'learning_rate': 5.1200000000000004e-05, 'epoch': 0.02}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 244/500 [04:11<06:02, 1.42s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 245/500 [04:12<06:02, 1.42s/it] {'loss': 0.0, 'learning_rate': 5.1000000000000006e-05, 'epoch': 0.02}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 245/500 [04:12<06:02, 1.42s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 246/500 [04:14<06:01, 1.42s/it] {'loss': 0.0, 'learning_rate': 5.08e-05, 'epoch': 0.02}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 246/500 [04:14<06:01, 1.42s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 247/500 [04:15<05:59, 1.42s/it] {'loss': 0.0, 'learning_rate': 5.0600000000000003e-05, 'epoch': 0.02}
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 247/500 [04:15<05:59, 1.42s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 248/500 [04:17<05:55, 1.41s/it] {'loss': 0.0, 'learning_rate': 5.0400000000000005e-05, 'epoch': 0.02}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 248/500 [04:17<05:55, 1.41s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 249/500 [04:18<05:54, 1.41s/it] {'loss': 0.0, 'learning_rate': 5.02e-05, 'epoch': 0.02}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 249/500 [04:18<05:54, 1.41s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 250/500 [04:19<05:51, 1.41s/it] {'loss': 0.0, 'learning_rate': 5e-05, 'epoch': 0.02}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 250/500 [04:20<05:51, 1.41s/it][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:31:42,260 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-250/tokenizer_config.json
[INFO|tokenization_utils_base.py:2446] 2023-12-10 15:31:42,261 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-250/special_tokens_map.json
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 251/500 [04:21<06:00, 1.45s/it] {'loss': 0.0, 'learning_rate': 4.9800000000000004e-05, 'epoch': 0.02}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 251/500 [04:21<06:00, 1.45s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 252/500 [04:22<05:55, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.96e-05, 'epoch': 0.02}
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 252/500 [04:22<05:55, 1.43s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 253/500 [04:24<05:51, 1.42s/it] {'loss': 0.0, 'learning_rate': 4.94e-05, 'epoch': 0.02}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 253/500 [04:24<05:51, 1.42s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 254/500 [04:25<05:48, 1.42s/it] {'loss': 0.0, 'learning_rate': 4.92e-05, 'epoch': 0.02}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 254/500 [04:25<05:48, 1.42s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 255/500 [04:27<05:39, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.9e-05, 'epoch': 0.02}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 255/500 [04:27<05:39, 1.39s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 256/500 [04:28<05:38, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.88e-05, 'epoch': 0.02}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 256/500 [04:28<05:38, 1.39s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 257/500 [04:29<05:35, 1.38s/it] {'loss': 0.0, 'learning_rate': 4.86e-05, 'epoch': 0.02}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 257/500 [04:29<05:35, 1.38s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 258/500 [04:31<05:36, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.8400000000000004e-05, 'epoch': 0.02}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 258/500 [04:31<05:36, 1.39s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 259/500 [04:32<05:33, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.82e-05, 'epoch': 0.02}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 259/500 [04:32<05:33, 1.39s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 260/500 [04:33<05:30, 1.38s/it] {'loss': 0.0, 'learning_rate': 4.8e-05, 'epoch': 0.02}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 260/500 [04:33<05:30, 1.38s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 261/500 [04:35<05:31, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.78e-05, 'epoch': 0.02}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 261/500 [04:35<05:31, 1.39s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 262/500 [04:36<05:31, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.76e-05, 'epoch': 0.02}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 262/500 [04:36<05:31, 1.39s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 263/500 [04:38<05:31, 1.40s/it] {'loss': 0.0, 'learning_rate': 4.74e-05, 'epoch': 0.02}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 263/500 [04:38<05:31, 1.40s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 264/500 [04:39<05:28, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.72e-05, 'epoch': 0.02}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 264/500 [04:39<05:28, 1.39s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 265/500 [04:40<05:27, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.7e-05, 'epoch': 0.02}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 265/500 [04:40<05:27, 1.39s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 266/500 [04:42<05:25, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.6800000000000006e-05, 'epoch': 0.02}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 266/500 [04:42<05:25, 1.39s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 267/500 [04:43<05:22, 1.38s/it] {'loss': 0.0, 'learning_rate': 4.660000000000001e-05, 'epoch': 0.02}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 267/500 [04:43<05:22, 1.38s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 268/500 [04:45<05:23, 1.40s/it] {'loss': 0.0, 'learning_rate': 4.64e-05, 'epoch': 0.02}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 268/500 [04:45<05:23, 1.40s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 269/500 [04:46<05:22, 1.40s/it] {'loss': 0.0, 'learning_rate': 4.6200000000000005e-05, 'epoch': 0.02}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 269/500 [04:46<05:22, 1.40s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 270/500 [04:47<05:25, 1.42s/it] {'loss': 0.0, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.02}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 270/500 [04:47<05:25, 1.42s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 271/500 [04:49<05:23, 1.41s/it] {'loss': 0.0, 'learning_rate': 4.58e-05, 'epoch': 0.02}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 271/500 [04:49<05:23, 1.41s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 272/500 [04:50<05:20, 1.41s/it] {'loss': 0.0, 'learning_rate': 4.5600000000000004e-05, 'epoch': 0.02}
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 272/500 [04:50<05:20, 1.41s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 273/500 [04:52<05:22, 1.42s/it] {'loss': 0.0, 'learning_rate': 4.5400000000000006e-05, 'epoch': 0.02}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 273/500 [04:52<05:22, 1.42s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 274/500 [04:53<05:14, 1.39s/it] {'loss': 0.0, 'learning_rate': 4.52e-05, 'epoch': 0.02}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 274/500 [04:53<05:14, 1.39s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 275/500 [04:55<05:17, 1.41s/it] {'loss': 0.0, 'learning_rate': 4.5e-05, 'epoch': 0.02}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 275/500 [04:55<05:17, 1.41s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 276/500 [04:56<05:15, 1.41s/it] {'loss': 0.0, 'learning_rate': 4.4800000000000005e-05, 'epoch': 0.02}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 276/500 [04:56<05:15, 1.41s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 277/500 [04:57<05:15, 1.42s/it] {'loss': 0.0, 'learning_rate': 4.46e-05, 'epoch': 0.02}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 277/500 [04:57<05:15, 1.42s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 278/500 [04:59<05:17, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.44e-05, 'epoch': 0.02}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 278/500 [04:59<05:17, 1.43s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 279/500 [05:00<05:15, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.4200000000000004e-05, 'epoch': 0.02}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 279/500 [05:00<05:15, 1.43s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 280/500 [05:02<05:16, 1.44s/it] {'loss': 0.0, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.02}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 280/500 [05:02<05:16, 1.44s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 281/500 [05:03<05:17, 1.45s/it] {'loss': 0.0, 'learning_rate': 4.38e-05, 'epoch': 0.02}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 281/500 [05:03<05:17, 1.45s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 282/500 [05:05<05:17, 1.45s/it] {'loss': 0.0, 'learning_rate': 4.36e-05, 'epoch': 0.02}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 282/500 [05:05<05:17, 1.45s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 283/500 [05:06<05:10, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.3400000000000005e-05, 'epoch': 0.02}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 283/500 [05:06<05:10, 1.43s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 284/500 [05:07<05:08, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.32e-05, 'epoch': 0.02}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 284/500 [05:07<05:08, 1.43s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 285/500 [05:09<05:08, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.3e-05, 'epoch': 0.02}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 285/500 [05:09<05:08, 1.43s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 286/500 [05:10<05:07, 1.44s/it] {'loss': 0.0, 'learning_rate': 4.2800000000000004e-05, 'epoch': 0.02}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 286/500 [05:10<05:07, 1.44s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 287/500 [05:12<05:06, 1.44s/it] {'loss': 0.0, 'learning_rate': 4.26e-05, 'epoch': 0.02}
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 287/500 [05:12<05:06, 1.44s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 288/500 [05:13<05:05, 1.44s/it] {'loss': 0.0, 'learning_rate': 4.24e-05, 'epoch': 0.02}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 288/500 [05:13<05:05, 1.44s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 289/500 [05:15<05:02, 1.43s/it] {'loss': 0.0, 'learning_rate': 4.22e-05, 'epoch': 0.02}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 289/500 [05:15<05:02, 1.43s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 290/500 [05:16<04:38, 1.33s/it] {'loss': 0.0, 'learning_rate': 4.2e-05, 'epoch': 0.02}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 290/500 [05:16<04:38, 1.33s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 291/500 [05:17<04:11, 1.20s/it] {'loss': 0.0, 'learning_rate': 4.18e-05, 'epoch': 0.02}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 291/500 [05:17<04:11, 1.20s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 292/500 [05:18<03:54, 1.13s/it] {'loss': 0.0, 'learning_rate': 4.16e-05, 'epoch': 0.02}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 292/500 [05:18<03:54, 1.13s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 293/500 [05:19<03:42, 1.07s/it] {'loss': 0.0, 'learning_rate': 4.14e-05, 'epoch': 0.02}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 293/500 [05:19<03:42, 1.07s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 294/500 [05:19<03:31, 1.03s/it] {'loss': 0.0, 'learning_rate': 4.12e-05, 'epoch': 0.02}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 294/500 [05:19<03:31, 1.03s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 295/500 [05:20<03:23, 1.01it/s] {'loss': 0.0, 'learning_rate': 4.1e-05, 'epoch': 0.02}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 295/500 [05:20<03:23, 1.01it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 296/500 [05:21<03:16, 1.04it/s] {'loss': 0.0, 'learning_rate': 4.08e-05, 'epoch': 0.02}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 296/500 [05:21<03:16, 1.04it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 297/500 [05:22<03:11, 1.06it/s] {'loss': 0.0, 'learning_rate': 4.0600000000000004e-05, 'epoch': 0.02}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 297/500 [05:22<03:11, 1.06it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 298/500 [05:23<03:22, 1.00s/it] {'loss': 0.0, 'learning_rate': 4.0400000000000006e-05, 'epoch': 0.02}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 298/500 [05:23<03:22, 1.00s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 299/500 [05:25<03:44, 1.11s/it] {'loss': 0.0, 'learning_rate': 4.02e-05, 'epoch': 0.02}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 299/500 [05:25<03:44, 1.11s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 300/500 [05:26<04:02, 1.21s/it] {'loss': 0.0, 'learning_rate': 4e-05, 'epoch': 0.02}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 300/500 [05:26<04:02, 1.21s/it][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:32:48,867 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-300/tokenizer_config.json
[INFO|tokenization_utils_base.py:2446] 2023-12-10 15:32:48,867 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-300/special_tokens_map.json
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 301/500 [05:28<04:13, 1.28s/it] {'loss': 0.0, 'learning_rate': 3.9800000000000005e-05, 'epoch': 0.02}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 301/500 [05:28<04:13, 1.28s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 302/500 [05:29<04:03, 1.23s/it] {'loss': 0.0, 'learning_rate': 3.960000000000001e-05, 'epoch': 0.02}
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 302/500 [05:29<04:03, 1.23s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 303/500 [05:30<03:43, 1.14s/it] {'loss': 0.0, 'learning_rate': 3.94e-05, 'epoch': 0.02}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 303/500 [05:30<03:43, 1.14s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 304/500 [05:31<03:30, 1.08s/it] {'loss': 0.0, 'learning_rate': 3.9200000000000004e-05, 'epoch': 0.02}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 304/500 [05:31<03:30, 1.08s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 305/500 [05:31<03:21, 1.03s/it] {'loss': 0.0, 'learning_rate': 3.9000000000000006e-05, 'epoch': 0.02}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 305/500 [05:31<03:21, 1.03s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 306/500 [05:32<03:14, 1.00s/it] {'loss': 0.0, 'learning_rate': 3.88e-05, 'epoch': 0.02}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 306/500 [05:32<03:14, 1.00s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 307/500 [05:33<03:07, 1.03it/s] {'loss': 0.0, 'learning_rate': 3.86e-05, 'epoch': 0.02}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 307/500 [05:33<03:07, 1.03it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 308/500 [05:34<02:36, 1.22it/s] {'loss': 0.0, 'learning_rate': 3.8400000000000005e-05, 'epoch': 0.02}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 308/500 [05:34<02:36, 1.22it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 309/500 [05:34<02:15, 1.41it/s] {'loss': 0.0, 'learning_rate': 3.82e-05, 'epoch': 0.02}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 309/500 [05:34<02:15, 1.41it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 310/500 [05:35<01:59, 1.58it/s] {'loss': 0.0, 'learning_rate': 3.8e-05, 'epoch': 0.02}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 310/500 [05:35<01:59, 1.58it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 311/500 [05:35<01:49, 1.73it/s] {'loss': 0.0, 'learning_rate': 3.7800000000000004e-05, 'epoch': 0.02}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 311/500 [05:35<01:49, 1.73it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 312/500 [05:36<01:41, 1.85it/s] {'loss': 0.0, 'learning_rate': 3.76e-05, 'epoch': 0.02}
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 312/500 [05:36<01:41, 1.85it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 313/500 [05:36<01:36, 1.95it/s] {'loss': 0.0, 'learning_rate': 3.74e-05, 'epoch': 0.02}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 313/500 [05:36<01:36, 1.95it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 314/500 [05:36<01:35, 1.95it/s] {'loss': 0.0, 'learning_rate': 3.72e-05, 'epoch': 0.02}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 314/500 [05:37<01:35, 1.95it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 315/500 [05:37<01:40, 1.84it/s] {'loss': 0.0, 'learning_rate': 3.7e-05, 'epoch': 0.02}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 315/500 [05:37<01:40, 1.84it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 316/500 [05:38<01:46, 1.74it/s] {'loss': 0.0, 'learning_rate': 3.68e-05, 'epoch': 0.02}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 316/500 [05:38<01:46, 1.74it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 317/500 [05:38<01:51, 1.64it/s] {'loss': 0.0, 'learning_rate': 3.66e-05, 'epoch': 0.02}
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 317/500 [05:38<01:51, 1.64it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 318/500 [05:39<01:52, 1.62it/s] {'loss': 0.0, 'learning_rate': 3.6400000000000004e-05, 'epoch': 0.02}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 318/500 [05:39<01:52, 1.62it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 319/500 [05:40<01:44, 1.72it/s] {'loss': 0.0, 'learning_rate': 3.62e-05, 'epoch': 0.02}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 319/500 [05:40<01:44, 1.72it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 320/500 [05:40<01:46, 1.69it/s] {'loss': 0.0, 'learning_rate': 3.6e-05, 'epoch': 0.02}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 320/500 [05:40<01:46, 1.69it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 321/500 [05:41<01:45, 1.70it/s] {'loss': 0.0, 'learning_rate': 3.58e-05, 'epoch': 0.02}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 321/500 [05:41<01:45, 1.70it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 322/500 [05:41<01:44, 1.70it/s] {'loss': 0.0, 'learning_rate': 3.56e-05, 'epoch': 0.02}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 322/500 [05:41<01:44, 1.70it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 323/500 [05:42<01:44, 1.69it/s] {'loss': 0.0, 'learning_rate': 3.54e-05, 'epoch': 0.02}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 323/500 [05:42<01:44, 1.69it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 324/500 [05:43<01:47, 1.64it/s] {'loss': 0.0, 'learning_rate': 3.52e-05, 'epoch': 0.02}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 324/500 [05:43<01:47, 1.64it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 325/500 [05:43<01:48, 1.62it/s] {'loss': 0.0, 'learning_rate': 3.5e-05, 'epoch': 0.02}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 325/500 [05:43<01:48, 1.62it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 326/500 [05:44<01:51, 1.57it/s] {'loss': 0.0, 'learning_rate': 3.48e-05, 'epoch': 0.03}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 326/500 [05:44<01:51, 1.57it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 327/500 [05:45<02:14, 1.29it/s] {'loss': 0.0, 'learning_rate': 3.46e-05, 'epoch': 0.03}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 327/500 [05:45<02:14, 1.29it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 328/500 [05:46<02:44, 1.05it/s] {'loss': 0.0, 'learning_rate': 3.4399999999999996e-05, 'epoch': 0.03}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 328/500 [05:46<02:44, 1.05it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 329/500 [05:48<03:05, 1.09s/it] {'loss': 0.0, 'learning_rate': 3.4200000000000005e-05, 'epoch': 0.03}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 329/500 [05:48<03:05, 1.09s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 330/500 [05:49<03:11, 1.13s/it] {'loss': 0.0, 'learning_rate': 3.4000000000000007e-05, 'epoch': 0.03}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 330/500 [05:49<03:11, 1.13s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 331/500 [05:50<03:03, 1.09s/it] {'loss': 0.0, 'learning_rate': 3.38e-05, 'epoch': 0.03}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 331/500 [05:50<03:03, 1.09s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 332/500 [05:51<02:51, 1.02s/it] {'loss': 0.0, 'learning_rate': 3.3600000000000004e-05, 'epoch': 0.03}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 332/500 [05:51<02:51, 1.02s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 333/500 [05:52<02:45, 1.01it/s] {'loss': 0.0, 'learning_rate': 3.3400000000000005e-05, 'epoch': 0.03}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 333/500 [05:52<02:45, 1.01it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 334/500 [05:53<02:38, 1.05it/s] {'loss': 0.0, 'learning_rate': 3.32e-05, 'epoch': 0.03}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 334/500 [05:53<02:38, 1.05it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 335/500 [05:54<02:34, 1.07it/s] {'loss': 0.0, 'learning_rate': 3.3e-05, 'epoch': 0.03}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 335/500 [05:54<02:34, 1.07it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 336/500 [05:54<02:23, 1.14it/s] {'loss': 0.0, 'learning_rate': 3.2800000000000004e-05, 'epoch': 0.03}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 336/500 [05:54<02:23, 1.14it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 337/500 [05:55<02:16, 1.19it/s] {'loss': 0.0, 'learning_rate': 3.26e-05, 'epoch': 0.03}
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 337/500 [05:55<02:16, 1.19it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 338/500 [05:56<02:12, 1.22it/s] {'loss': 0.0, 'learning_rate': 3.24e-05, 'epoch': 0.03}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 338/500 [05:56<02:12, 1.22it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 339/500 [05:56<02:00, 1.33it/s] {'loss': 0.0, 'learning_rate': 3.2200000000000003e-05, 'epoch': 0.03}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 339/500 [05:56<02:00, 1.33it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 340/500 [05:57<01:50, 1.45it/s] {'loss': 0.0, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.03}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 340/500 [05:57<01:50, 1.45it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 341/500 [05:58<01:43, 1.53it/s] {'loss': 0.0, 'learning_rate': 3.18e-05, 'epoch': 0.03}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 341/500 [05:58<01:43, 1.53it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 342/500 [05:58<01:42, 1.54it/s] {'loss': 0.0, 'learning_rate': 3.16e-05, 'epoch': 0.03}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 342/500 [05:58<01:42, 1.54it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 343/500 [05:59<01:39, 1.57it/s] {'loss': 0.0, 'learning_rate': 3.1400000000000004e-05, 'epoch': 0.03}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 343/500 [05:59<01:39, 1.57it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 344/500 [05:59<01:35, 1.63it/s] {'loss': 0.0, 'learning_rate': 3.12e-05, 'epoch': 0.03}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 344/500 [05:59<01:35, 1.63it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 345/500 [06:00<01:34, 1.63it/s] {'loss': 0.0, 'learning_rate': 3.1e-05, 'epoch': 0.03}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 345/500 [06:00<01:34, 1.63it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 346/500 [06:01<01:45, 1.46it/s] {'loss': 0.0, 'learning_rate': 3.08e-05, 'epoch': 0.03}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 346/500 [06:01<01:45, 1.46it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 347/500 [06:02<02:05, 1.22it/s] {'loss': 0.0, 'learning_rate': 3.06e-05, 'epoch': 0.03}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 347/500 [06:02<02:05, 1.22it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 348/500 [06:03<02:07, 1.19it/s] {'loss': 0.0, 'learning_rate': 3.04e-05, 'epoch': 0.03}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 348/500 [06:03<02:07, 1.19it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 349/500 [06:04<02:08, 1.18it/s] {'loss': 0.0, 'learning_rate': 3.02e-05, 'epoch': 0.03}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 349/500 [06:04<02:08, 1.18it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 350/500 [06:05<02:25, 1.03it/s] {'loss': 0.0, 'learning_rate': 3e-05, 'epoch': 0.03}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 350/500 [06:05<02:25, 1.03it/s][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:33:27,722 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-350/tokenizer_config.json
[INFO|tokenization_utils_base.py:2446] 2023-12-10 15:33:27,723 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-350/special_tokens_map.json
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 351/500 [06:06<02:45, 1.11s/it] {'loss': 0.0, 'learning_rate': 2.98e-05, 'epoch': 0.03}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 351/500 [06:06<02:45, 1.11s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 352/500 [06:08<02:56, 1.19s/it] {'loss': 0.0, 'learning_rate': 2.96e-05, 'epoch': 0.03}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 352/500 [06:08<02:56, 1.19s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 353/500 [06:09<03:04, 1.25s/it] {'loss': 0.0, 'learning_rate': 2.94e-05, 'epoch': 0.03}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 353/500 [06:09<03:04, 1.25s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 354/500 [06:11<03:08, 1.29s/it] {'loss': 0.0, 'learning_rate': 2.9199999999999998e-05, 'epoch': 0.03}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 354/500 [06:11<03:08, 1.29s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 355/500 [06:12<03:11, 1.32s/it] {'loss': 0.0, 'learning_rate': 2.9e-05, 'epoch': 0.03}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 355/500 [06:12<03:11, 1.32s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 356/500 [06:13<03:11, 1.33s/it] {'loss': 0.0, 'learning_rate': 2.88e-05, 'epoch': 0.03}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 356/500 [06:13<03:11, 1.33s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 357/500 [06:15<03:13, 1.35s/it] {'loss': 0.0, 'learning_rate': 2.86e-05, 'epoch': 0.03}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 357/500 [06:15<03:13, 1.35s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 358/500 [06:16<03:15, 1.38s/it] {'loss': 0.0, 'learning_rate': 2.84e-05, 'epoch': 0.03}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 358/500 [06:16<03:15, 1.38s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 359/500 [06:18<03:14, 1.38s/it] {'loss': 0.0, 'learning_rate': 2.8199999999999998e-05, 'epoch': 0.03}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 359/500 [06:18<03:14, 1.38s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 360/500 [06:19<03:14, 1.39s/it] {'loss': 0.0, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.03}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 360/500 [06:19<03:14, 1.39s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 361/500 [06:20<03:14, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.7800000000000005e-05, 'epoch': 0.03}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 361/500 [06:20<03:14, 1.40s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 362/500 [06:22<03:12, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.7600000000000003e-05, 'epoch': 0.03}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 362/500 [06:22<03:12, 1.40s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 363/500 [06:23<03:12, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.7400000000000002e-05, 'epoch': 0.03}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 363/500 [06:23<03:12, 1.40s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 364/500 [06:25<03:11, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.7200000000000004e-05, 'epoch': 0.03}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 364/500 [06:25<03:11, 1.40s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 365/500 [06:26<03:09, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.7000000000000002e-05, 'epoch': 0.03}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 365/500 [06:26<03:09, 1.40s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 366/500 [06:27<03:07, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.6800000000000004e-05, 'epoch': 0.03}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 366/500 [06:27<03:07, 1.40s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 367/500 [06:29<03:06, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.6600000000000003e-05, 'epoch': 0.03}
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 367/500 [06:29<03:06, 1.40s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 368/500 [06:30<03:06, 1.41s/it] {'loss': 0.0, 'learning_rate': 2.64e-05, 'epoch': 0.03}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 368/500 [06:30<03:06, 1.41s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 369/500 [06:32<03:04, 1.41s/it] {'loss': 0.0, 'learning_rate': 2.6200000000000003e-05, 'epoch': 0.03}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 369/500 [06:32<03:04, 1.41s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 370/500 [06:33<03:03, 1.41s/it] {'loss': 0.0, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.03}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 370/500 [06:33<03:03, 1.41s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 371/500 [06:34<03:01, 1.41s/it] {'loss': 0.0, 'learning_rate': 2.58e-05, 'epoch': 0.03}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 371/500 [06:34<03:01, 1.41s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 372/500 [06:36<02:59, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.5600000000000002e-05, 'epoch': 0.03}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 372/500 [06:36<02:59, 1.40s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 373/500 [06:37<02:56, 1.39s/it] {'loss': 0.0, 'learning_rate': 2.54e-05, 'epoch': 0.03}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 373/500 [06:37<02:56, 1.39s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 374/500 [06:39<02:55, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.5200000000000003e-05, 'epoch': 0.03}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 374/500 [06:39<02:55, 1.40s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 375/500 [06:40<02:53, 1.39s/it] {'loss': 0.0, 'learning_rate': 2.5e-05, 'epoch': 0.03}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 375/500 [06:40<02:53, 1.39s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 376/500 [06:41<02:52, 1.39s/it] {'loss': 0.0, 'learning_rate': 2.48e-05, 'epoch': 0.03}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 376/500 [06:41<02:52, 1.39s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 377/500 [06:43<02:52, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.46e-05, 'epoch': 0.03}
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 377/500 [06:43<02:52, 1.40s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 378/500 [06:44<02:50, 1.40s/it] {'loss': 0.0, 'learning_rate': 2.44e-05, 'epoch': 0.03}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 378/500 [06:44<02:50, 1.40s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 379/500 [06:46<02:51, 1.42s/it] {'loss': 0.0, 'learning_rate': 2.4200000000000002e-05, 'epoch': 0.03}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 379/500 [06:46<02:51, 1.42s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 380/500 [06:47<02:51, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.4e-05, 'epoch': 0.03}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 380/500 [06:47<02:51, 1.43s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 381/500 [06:48<02:49, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.38e-05, 'epoch': 0.03}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 381/500 [06:48<02:49, 1.43s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 382/500 [06:50<02:46, 1.41s/it] {'loss': 0.0, 'learning_rate': 2.36e-05, 'epoch': 0.03}
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 382/500 [06:50<02:46, 1.41s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 383/500 [06:51<02:45, 1.42s/it] {'loss': 0.0, 'learning_rate': 2.3400000000000003e-05, 'epoch': 0.03}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 383/500 [06:51<02:45, 1.42s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 384/500 [06:53<02:45, 1.42s/it] {'loss': 0.0, 'learning_rate': 2.32e-05, 'epoch': 0.03}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 384/500 [06:53<02:45, 1.42s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 385/500 [06:54<02:44, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.3000000000000003e-05, 'epoch': 0.03}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 385/500 [06:54<02:44, 1.43s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 386/500 [06:56<02:44, 1.44s/it] {'loss': 0.0, 'learning_rate': 2.2800000000000002e-05, 'epoch': 0.03}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 386/500 [06:56<02:44, 1.44s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 387/500 [06:57<02:42, 1.44s/it] {'loss': 0.0, 'learning_rate': 2.26e-05, 'epoch': 0.03}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 387/500 [06:57<02:42, 1.44s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 388/500 [06:59<02:41, 1.45s/it] {'loss': 0.0, 'learning_rate': 2.2400000000000002e-05, 'epoch': 0.03}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 388/500 [06:59<02:41, 1.45s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 389/500 [07:00<02:38, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.22e-05, 'epoch': 0.03}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 389/500 [07:00<02:38, 1.43s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 390/500 [07:01<02:22, 1.30s/it] {'loss': 0.0, 'learning_rate': 2.2000000000000003e-05, 'epoch': 0.03}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 390/500 [07:01<02:22, 1.30s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 391/500 [07:02<02:08, 1.18s/it] {'loss': 0.0, 'learning_rate': 2.18e-05, 'epoch': 0.03}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 391/500 [07:02<02:08, 1.18s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 392/500 [07:03<01:59, 1.10s/it] {'loss': 0.0, 'learning_rate': 2.16e-05, 'epoch': 0.03}
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 392/500 [07:03<01:59, 1.10s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 393/500 [07:04<01:51, 1.04s/it] {'loss': 0.0, 'learning_rate': 2.1400000000000002e-05, 'epoch': 0.03}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 393/500 [07:04<01:51, 1.04s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 394/500 [07:05<01:46, 1.01s/it] {'loss': 0.0, 'learning_rate': 2.12e-05, 'epoch': 0.03}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 394/500 [07:05<01:46, 1.01s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 395/500 [07:05<01:42, 1.03it/s] {'loss': 0.0, 'learning_rate': 2.1e-05, 'epoch': 0.03}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 395/500 [07:05<01:42, 1.03it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 396/500 [07:06<01:35, 1.09it/s] {'loss': 0.0, 'learning_rate': 2.08e-05, 'epoch': 0.03}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 396/500 [07:06<01:35, 1.09it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 397/500 [07:07<01:24, 1.23it/s] {'loss': 0.0, 'learning_rate': 2.06e-05, 'epoch': 0.03}
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 397/500 [07:07<01:24, 1.23it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 398/500 [07:08<01:20, 1.27it/s] {'loss': 0.0, 'learning_rate': 2.04e-05, 'epoch': 0.03}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 398/500 [07:08<01:20, 1.27it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 399/500 [07:08<01:16, 1.33it/s] {'loss': 0.0, 'learning_rate': 2.0200000000000003e-05, 'epoch': 0.03}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 399/500 [07:08<01:16, 1.33it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 400/500 [07:09<01:17, 1.29it/s] {'loss': 0.0, 'learning_rate': 2e-05, 'epoch': 0.03}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 400/500 [07:09<01:17, 1.29it/s][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:34:31,811 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-400/tokenizer_config.json
[INFO|tokenization_utils_base.py:2446] 2023-12-10 15:34:31,811 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-400/special_tokens_map.json
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 401/500 [07:10<01:18, 1.26it/s] {'loss': 0.0, 'learning_rate': 1.9800000000000004e-05, 'epoch': 0.03}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 401/500 [07:10<01:18, 1.26it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 402/500 [07:11<01:34, 1.03it/s] {'loss': 0.0, 'learning_rate': 1.9600000000000002e-05, 'epoch': 0.03}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 402/500 [07:11<01:34, 1.03it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 403/500 [07:13<01:43, 1.07s/it] {'loss': 0.0, 'learning_rate': 1.94e-05, 'epoch': 0.03}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 403/500 [07:13<01:43, 1.07s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 404/500 [07:14<01:52, 1.18s/it] {'loss': 0.0, 'learning_rate': 1.9200000000000003e-05, 'epoch': 0.03}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 404/500 [07:14<01:52, 1.18s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 405/500 [07:15<01:52, 1.19s/it] {'loss': 0.0, 'learning_rate': 1.9e-05, 'epoch': 0.03}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 405/500 [07:15<01:52, 1.19s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 406/500 [07:16<01:44, 1.11s/it] {'loss': 0.0, 'learning_rate': 1.88e-05, 'epoch': 0.03}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 406/500 [07:16<01:44, 1.11s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 407/500 [07:17<01:38, 1.06s/it] {'loss': 0.0, 'learning_rate': 1.86e-05, 'epoch': 0.03}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 407/500 [07:17<01:38, 1.06s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 408/500 [07:18<01:32, 1.01s/it] {'loss': 0.0, 'learning_rate': 1.84e-05, 'epoch': 0.03}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 408/500 [07:18<01:32, 1.01s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 409/500 [07:19<01:29, 1.02it/s] {'loss': 0.0, 'learning_rate': 1.8200000000000002e-05, 'epoch': 0.03}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 409/500 [07:19<01:29, 1.02it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 410/500 [07:20<01:27, 1.03it/s] {'loss': 0.0, 'learning_rate': 1.8e-05, 'epoch': 0.03}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 410/500 [07:20<01:27, 1.03it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 411/500 [07:21<01:24, 1.06it/s] {'loss': 0.0, 'learning_rate': 1.78e-05, 'epoch': 0.03}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 411/500 [07:21<01:24, 1.06it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 412/500 [07:22<01:23, 1.06it/s] {'loss': 0.0, 'learning_rate': 1.76e-05, 'epoch': 0.03}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 412/500 [07:22<01:23, 1.06it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 413/500 [07:23<01:20, 1.08it/s] {'loss': 0.0, 'learning_rate': 1.74e-05, 'epoch': 0.03}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 413/500 [07:23<01:20, 1.08it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 414/500 [07:23<01:17, 1.11it/s] {'loss': 0.0, 'learning_rate': 1.7199999999999998e-05, 'epoch': 0.03}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 414/500 [07:23<01:17, 1.11it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 415/500 [07:24<01:16, 1.12it/s] {'loss': 0.0, 'learning_rate': 1.7000000000000003e-05, 'epoch': 0.03}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 415/500 [07:24<01:16, 1.12it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 416/500 [07:25<01:14, 1.12it/s] {'loss': 0.0, 'learning_rate': 1.6800000000000002e-05, 'epoch': 0.03}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 416/500 [07:25<01:14, 1.12it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 417/500 [07:26<01:13, 1.13it/s] {'loss': 0.0, 'learning_rate': 1.66e-05, 'epoch': 0.03}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 417/500 [07:26<01:13, 1.13it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 418/500 [07:27<01:15, 1.09it/s] {'loss': 0.0, 'learning_rate': 1.6400000000000002e-05, 'epoch': 0.03}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 418/500 [07:27<01:15, 1.09it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 419/500 [07:28<01:19, 1.02it/s] {'loss': 0.0, 'learning_rate': 1.62e-05, 'epoch': 0.03}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 419/500 [07:28<01:19, 1.02it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 420/500 [07:30<01:29, 1.11s/it] {'loss': 0.0, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.03}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 420/500 [07:30<01:29, 1.11s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 421/500 [07:31<01:36, 1.23s/it] {'loss': 0.0, 'learning_rate': 1.58e-05, 'epoch': 0.03}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 421/500 [07:31<01:36, 1.23s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 422/500 [07:32<01:39, 1.28s/it] {'loss': 0.0, 'learning_rate': 1.56e-05, 'epoch': 0.03}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 422/500 [07:32<01:39, 1.28s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 423/500 [07:34<01:41, 1.32s/it] {'loss': 0.0, 'learning_rate': 1.54e-05, 'epoch': 0.03}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 423/500 [07:34<01:41, 1.32s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 424/500 [07:35<01:42, 1.34s/it] {'loss': 0.0, 'learning_rate': 1.52e-05, 'epoch': 0.03}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 424/500 [07:35<01:42, 1.34s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 425/500 [07:37<01:43, 1.37s/it] {'loss': 0.0, 'learning_rate': 1.5e-05, 'epoch': 0.03}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 425/500 [07:37<01:43, 1.37s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 426/500 [07:38<01:42, 1.39s/it] {'loss': 0.0, 'learning_rate': 1.48e-05, 'epoch': 0.03}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 426/500 [07:38<01:42, 1.39s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 427/500 [07:40<01:40, 1.38s/it] {'loss': 0.0, 'learning_rate': 1.4599999999999999e-05, 'epoch': 0.03}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 427/500 [07:40<01:40, 1.38s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 428/500 [07:40<01:28, 1.23s/it] {'loss': 0.0, 'learning_rate': 1.44e-05, 'epoch': 0.03}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 428/500 [07:40<01:28, 1.23s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 429/500 [07:41<01:20, 1.13s/it] {'loss': 0.0, 'learning_rate': 1.42e-05, 'epoch': 0.03}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 429/500 [07:41<01:20, 1.13s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 430/500 [07:42<01:14, 1.06s/it] {'loss': 0.0, 'learning_rate': 1.4000000000000001e-05, 'epoch': 0.03}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 430/500 [07:42<01:14, 1.06s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 431/500 [07:43<01:08, 1.00it/s] {'loss': 0.0, 'learning_rate': 1.3800000000000002e-05, 'epoch': 0.03}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 431/500 [07:43<01:08, 1.00it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 432/500 [07:44<01:05, 1.04it/s] {'loss': 0.0, 'learning_rate': 1.3600000000000002e-05, 'epoch': 0.03}
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 432/500 [07:44<01:05, 1.04it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 433/500 [07:45<01:02, 1.07it/s] {'loss': 0.0, 'learning_rate': 1.3400000000000002e-05, 'epoch': 0.03}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 433/500 [07:45<01:02, 1.07it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 434/500 [07:46<01:00, 1.09it/s] {'loss': 0.0, 'learning_rate': 1.32e-05, 'epoch': 0.03}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 434/500 [07:46<01:00, 1.09it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 435/500 [07:47<00:58, 1.11it/s] {'loss': 0.0, 'learning_rate': 1.3000000000000001e-05, 'epoch': 0.03}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 435/500 [07:47<00:58, 1.11it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 436/500 [07:47<00:54, 1.17it/s] {'loss': 0.0, 'learning_rate': 1.2800000000000001e-05, 'epoch': 0.03}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 436/500 [07:47<00:54, 1.17it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 437/500 [07:48<00:49, 1.28it/s] {'loss': 0.0, 'learning_rate': 1.2600000000000001e-05, 'epoch': 0.03}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 437/500 [07:48<00:49, 1.28it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 438/500 [07:48<00:44, 1.40it/s] {'loss': 0.0, 'learning_rate': 1.24e-05, 'epoch': 0.03}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 438/500 [07:48<00:44, 1.40it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 439/500 [07:49<00:41, 1.46it/s] {'loss': 0.0, 'learning_rate': 1.22e-05, 'epoch': 0.03}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 439/500 [07:49<00:41, 1.46it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 440/500 [07:50<00:39, 1.53it/s] {'loss': 0.0, 'learning_rate': 1.2e-05, 'epoch': 0.03}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 440/500 [07:50<00:39, 1.53it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 441/500 [07:50<00:37, 1.56it/s] {'loss': 0.0, 'learning_rate': 1.18e-05, 'epoch': 0.03}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 441/500 [07:50<00:37, 1.56it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 442/500 [07:51<00:44, 1.31it/s] {'loss': 0.0, 'learning_rate': 1.16e-05, 'epoch': 0.03}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 442/500 [07:51<00:44, 1.31it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 443/500 [07:53<00:54, 1.05it/s] {'loss': 0.0, 'learning_rate': 1.1400000000000001e-05, 'epoch': 0.03}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 443/500 [07:53<00:54, 1.05it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 444/500 [07:54<01:00, 1.07s/it] {'loss': 0.0, 'learning_rate': 1.1200000000000001e-05, 'epoch': 0.03}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 444/500 [07:54<01:00, 1.07s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 445/500 [07:55<01:03, 1.16s/it] {'loss': 0.0, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.03}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 445/500 [07:55<01:03, 1.16s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 446/500 [07:56<00:59, 1.10s/it] {'loss': 0.0, 'learning_rate': 1.08e-05, 'epoch': 0.03}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 446/500 [07:56<00:59, 1.10s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 447/500 [07:57<00:55, 1.04s/it] {'loss': 0.0, 'learning_rate': 1.06e-05, 'epoch': 0.03}
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 447/500 [07:57<00:55, 1.04s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 448/500 [07:58<00:51, 1.00it/s] {'loss': 0.0, 'learning_rate': 1.04e-05, 'epoch': 0.03}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 448/500 [07:58<00:51, 1.00it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 449/500 [07:59<00:46, 1.09it/s] {'loss': 0.0, 'learning_rate': 1.02e-05, 'epoch': 0.03}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 449/500 [07:59<00:46, 1.09it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 450/500 [08:00<00:45, 1.10it/s] {'loss': 0.0, 'learning_rate': 1e-05, 'epoch': 0.03}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 450/500 [08:00<00:45, 1.10it/s][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:35:22,531 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-450/tokenizer_config.json
[INFO|tokenization_utils_base.py:2446] 2023-12-10 15:35:22,531 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-450/special_tokens_map.json
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 451/500 [08:01<00:44, 1.10it/s] {'loss': 0.0, 'learning_rate': 9.800000000000001e-06, 'epoch': 0.03}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 451/500 [08:01<00:44, 1.10it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 452/500 [08:02<00:42, 1.12it/s] {'loss': 0.0, 'learning_rate': 9.600000000000001e-06, 'epoch': 0.03}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 452/500 [08:02<00:42, 1.12it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 453/500 [08:02<00:41, 1.13it/s] {'loss': 0.0, 'learning_rate': 9.4e-06, 'epoch': 0.03}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 453/500 [08:02<00:41, 1.13it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 454/500 [08:03<00:40, 1.13it/s] {'loss': 0.0, 'learning_rate': 9.2e-06, 'epoch': 0.03}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 454/500 [08:03<00:40, 1.13it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 455/500 [08:04<00:38, 1.17it/s] {'loss': 0.0, 'learning_rate': 9e-06, 'epoch': 0.03}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 455/500 [08:04<00:38, 1.17it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 456/500 [08:05<00:34, 1.26it/s] {'loss': 0.0, 'learning_rate': 8.8e-06, 'epoch': 0.04}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 456/500 [08:05<00:34, 1.26it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 457/500 [08:05<00:31, 1.35it/s] {'loss': 0.0, 'learning_rate': 8.599999999999999e-06, 'epoch': 0.04}
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 457/500 [08:05<00:31, 1.35it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 458/500 [08:06<00:29, 1.40it/s] {'loss': 0.0, 'learning_rate': 8.400000000000001e-06, 'epoch': 0.04}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 458/500 [08:06<00:29, 1.40it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 459/500 [08:07<00:27, 1.49it/s] {'loss': 0.0, 'learning_rate': 8.200000000000001e-06, 'epoch': 0.04}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 459/500 [08:07<00:27, 1.49it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 460/500 [08:07<00:26, 1.51it/s] {'loss': 0.0, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.04}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 460/500 [08:07<00:26, 1.51it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 461/500 [08:08<00:25, 1.51it/s] {'loss': 0.0, 'learning_rate': 7.8e-06, 'epoch': 0.04}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 461/500 [08:08<00:25, 1.51it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 462/500 [08:08<00:24, 1.58it/s] {'loss': 0.0, 'learning_rate': 7.6e-06, 'epoch': 0.04}
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 462/500 [08:08<00:24, 1.58it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 463/500 [08:09<00:24, 1.52it/s] {'loss': 0.0, 'learning_rate': 7.4e-06, 'epoch': 0.04}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 463/500 [08:09<00:24, 1.52it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 464/500 [08:10<00:22, 1.63it/s] {'loss': 0.0, 'learning_rate': 7.2e-06, 'epoch': 0.04}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 464/500 [08:10<00:22, 1.63it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 465/500 [08:10<00:21, 1.64it/s] {'loss': 0.0, 'learning_rate': 7.000000000000001e-06, 'epoch': 0.04}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 465/500 [08:10<00:21, 1.64it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 466/500 [08:11<00:21, 1.60it/s] {'loss': 0.0, 'learning_rate': 6.800000000000001e-06, 'epoch': 0.04}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 466/500 [08:11<00:21, 1.60it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 467/500 [08:12<00:21, 1.56it/s] {'loss': 0.0, 'learning_rate': 6.6e-06, 'epoch': 0.04}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 467/500 [08:12<00:21, 1.56it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 468/500 [08:13<00:25, 1.26it/s] {'loss': 0.0, 'learning_rate': 6.4000000000000006e-06, 'epoch': 0.04}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 468/500 [08:13<00:25, 1.26it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 469/500 [08:14<00:29, 1.04it/s] {'loss': 0.0, 'learning_rate': 6.2e-06, 'epoch': 0.04}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 469/500 [08:14<00:29, 1.04it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 470/500 [08:16<00:33, 1.12s/it] {'loss': 0.0, 'learning_rate': 6e-06, 'epoch': 0.04}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 470/500 [08:16<00:33, 1.12s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 471/500 [08:17<00:35, 1.21s/it] {'loss': 0.0, 'learning_rate': 5.8e-06, 'epoch': 0.04}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 471/500 [08:17<00:35, 1.21s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 472/500 [08:18<00:35, 1.25s/it] {'loss': 0.0, 'learning_rate': 5.600000000000001e-06, 'epoch': 0.04}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 472/500 [08:18<00:35, 1.25s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 473/500 [08:20<00:34, 1.29s/it] {'loss': 0.0, 'learning_rate': 5.4e-06, 'epoch': 0.04}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 473/500 [08:20<00:34, 1.29s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 474/500 [08:21<00:34, 1.33s/it] {'loss': 0.0, 'learning_rate': 5.2e-06, 'epoch': 0.04}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 474/500 [08:21<00:34, 1.33s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 475/500 [08:23<00:33, 1.35s/it] {'loss': 0.0, 'learning_rate': 5e-06, 'epoch': 0.04}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 475/500 [08:23<00:33, 1.35s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 476/500 [08:24<00:32, 1.34s/it] {'loss': 0.0, 'learning_rate': 4.800000000000001e-06, 'epoch': 0.04}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 476/500 [08:24<00:32, 1.34s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 477/500 [08:25<00:31, 1.36s/it] {'loss': 0.0, 'learning_rate': 4.6e-06, 'epoch': 0.04}
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 477/500 [08:25<00:31, 1.36s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 478/500 [08:27<00:30, 1.37s/it] {'loss': 0.0, 'learning_rate': 4.4e-06, 'epoch': 0.04}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 478/500 [08:27<00:30, 1.37s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 479/500 [08:28<00:28, 1.37s/it] {'loss': 0.0, 'learning_rate': 4.2000000000000004e-06, 'epoch': 0.04}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 479/500 [08:28<00:28, 1.37s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 480/500 [08:30<00:28, 1.41s/it] {'loss': 0.0, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.04}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 480/500 [08:30<00:28, 1.41s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 481/500 [08:31<00:26, 1.41s/it] {'loss': 0.0, 'learning_rate': 3.8e-06, 'epoch': 0.04}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 481/500 [08:31<00:26, 1.41s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 482/500 [08:32<00:25, 1.41s/it] {'loss': 0.0, 'learning_rate': 3.6e-06, 'epoch': 0.04}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 482/500 [08:32<00:25, 1.41s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 483/500 [08:34<00:24, 1.42s/it] {'loss': 0.0, 'learning_rate': 3.4000000000000005e-06, 'epoch': 0.04}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 483/500 [08:34<00:24, 1.42s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 484/500 [08:35<00:22, 1.43s/it] {'loss': 0.0, 'learning_rate': 3.2000000000000003e-06, 'epoch': 0.04}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 484/500 [08:35<00:22, 1.43s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 485/500 [08:37<00:21, 1.44s/it] {'loss': 0.0, 'learning_rate': 3e-06, 'epoch': 0.04}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 485/500 [08:37<00:21, 1.44s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 486/500 [08:38<00:20, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.8000000000000003e-06, 'epoch': 0.04}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 486/500 [08:38<00:20, 1.43s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 487/500 [08:40<00:18, 1.44s/it] {'loss': 0.0, 'learning_rate': 2.6e-06, 'epoch': 0.04}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 487/500 [08:40<00:18, 1.44s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 488/500 [08:41<00:17, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.4000000000000003e-06, 'epoch': 0.04}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 488/500 [08:41<00:17, 1.43s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/500 [08:42<00:15, 1.42s/it] {'loss': 0.0, 'learning_rate': 2.2e-06, 'epoch': 0.04}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/500 [08:42<00:15, 1.42s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 490/500 [08:44<00:14, 1.43s/it] {'loss': 0.0, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.04}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 490/500 [08:44<00:14, 1.43s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 491/500 [08:45<00:12, 1.44s/it] {'loss': 0.0, 'learning_rate': 1.8e-06, 'epoch': 0.04}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 491/500 [08:45<00:12, 1.44s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 492/500 [08:47<00:11, 1.45s/it] {'loss': 0.0, 'learning_rate': 1.6000000000000001e-06, 'epoch': 0.04}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 492/500 [08:47<00:11, 1.45s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 493/500 [08:48<00:09, 1.39s/it] {'loss': 0.0, 'learning_rate': 1.4000000000000001e-06, 'epoch': 0.04}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 493/500 [08:48<00:09, 1.39s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 494/500 [08:49<00:07, 1.25s/it] {'loss': 0.0, 'learning_rate': 1.2000000000000002e-06, 'epoch': 0.04}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 494/500 [08:49<00:07, 1.25s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 495/500 [08:50<00:05, 1.15s/it] {'loss': 0.0, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.04}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 495/500 [08:50<00:05, 1.15s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 496/500 [08:51<00:04, 1.08s/it] {'loss': 0.0, 'learning_rate': 8.000000000000001e-07, 'epoch': 0.04}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 496/500 [08:51<00:04, 1.08s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 497/500 [08:52<00:03, 1.03s/it] {'loss': 0.0, 'learning_rate': 6.000000000000001e-07, 'epoch': 0.04}
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 497/500 [08:52<00:03, 1.03s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 498/500 [08:53<00:01, 1.01it/s] {'loss': 0.0, 'learning_rate': 4.0000000000000003e-07, 'epoch': 0.04}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 498/500 [08:53<00:01, 1.01it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 499/500 [08:54<00:00, 1.04it/s] {'loss': 0.0, 'learning_rate': 2.0000000000000002e-07, 'epoch': 0.04}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 499/500 [08:54<00:00, 1.04it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 500/500 [08:54<00:00, 1.13it/s] {'loss': 0.0, 'learning_rate': 0.0, 'epoch': 0.04}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 500/500 [08:54<00:00, 1.13it/s][INFO|tokenization_utils_base.py:2437] 2023-12-10 15:36:16,994 >> tokenizer config file saved in output/text-20231210-152648-1e-4/checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2446] 2023-12-10 15:36:16,994 >> Special tokens file saved in output/text-20231210-152648-1e-4/checkpoint-500/special_tokens_map.json
[INFO|trainer.py:2017] 2023-12-10 15:36:17,040 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 536.6749, 'train_samples_per_second': 3.727, 'train_steps_per_second': 0.932, 'train_loss': 0.002908447265625, 'epoch': 0.04}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 500/500 [08:54<00:00, 1.13it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 500/500 [08:54<00:00, 1.07s/it]
[INFO|tokenization_utils_base.py:2437] 2023-12-10 15:36:17,061 >> tokenizer config file saved in output/text-20231210-152648-1e-4/tokenizer_config.json
[INFO|tokenization_utils_base.py:2446] 2023-12-10 15:36:17,061 >> Special tokens file saved in output/text-20231210-152648-1e-4/special_tokens_map.json