W0729 11:42:27.572000 140695036090176 torch/distributed/run.py:757] W0729 11:42:27.572000 140695036090176 torch/distributed/run.py:757] ***************************************** W0729 11:42:27.572000 140695036090176 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0729 11:42:27.572000 140695036090176 torch/distributed/run.py:757] ***************************************** [2024-07-29 11:42:28,986] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-29 11:42:29,011] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-29 11:42:29,013] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-29 11:42:29,021] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-29 11:42:29,024] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-29 11:42:29,028] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-29 11:42:29,028] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-07-29 11:42:29,033] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it.petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!!Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. [2024-07-29 11:42:32,558] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-29 11:42:32,558] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. 07/29/2024 11:42:32 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 07/29/2024 11:42:32 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=/data/jcy/project/InternVL/internvl_chat/zero_stage3_config2.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=IntervalStrategy.NO, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=8, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=True, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/runs/Jul29_11-42-32_e028538ab8e8, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_kwargs={}, lr_scheduler_type=SchedulerType.COSINE, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=OptimizerNames.ADAMW_TORCH, optim_args=None, output_dir=/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=2, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['wandb'], resume_from_checkpoint=None, run_name=/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=200, save_strategy=IntervalStrategy.STEPS, save_total_limit=2, seed=42, skip_memory_metrics=True, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.03, warmup_steps=0, weight_decay=0.05, ) 07/29/2024 11:42:32 - INFO - __main__ - Loading Tokenizer: /data/jcy/ckpt/internvl-chat-v1-5 [INFO|tokenization_utils_base.py:2025] 2024-07-29 11:42:32,588 >> loading file ./tokenizer.model [INFO|tokenization_utils_base.py:2025] 2024-07-29 11:42:32,588 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2025] 2024-07-29 11:42:32,588 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2025] 2024-07-29 11:42:32,588 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2025] 2024-07-29 11:42:32,588 >> loading file tokenizer.json [WARNING|logging.py:314] 2024-07-29 11:42:32,734 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. 07/29/2024 11:42:32 - INFO - __main__ - Loading InternVLChatModel... [INFO|configuration_utils.py:727] 2024-07-29 11:42:32,845 >> loading configuration file /data/jcy/ckpt/internvl-chat-v1-5/config.json [INFO|configuration_utils.py:792] 2024-07-29 11:42:32,847 >> Model config InternVLChatConfig { "_commit_hash": null, "architectures": [ "InternVLChatModel" ], "auto_map": { "AutoConfig": "configuration_internvl_chat.InternVLChatConfig", "AutoModel": "modeling_internvl_chat.InternVLChatModel", "AutoModelForCausalLM": "modeling_internvl_chat.InternVLChatModel" }, "downsample_ratio": 0.5, "dynamic_image_size": true, "force_image_size": 448, "llm_config": { "_name_or_path": "internlm/internlm2-chat-20b", "add_cross_attention": false, "architectures": [ "InternLM2ForCausalLM" ], "attn_implementation": "flash_attention_2", "auto_map": { "AutoConfig": "configuration_internlm2.InternLM2Config", "AutoModel": "modeling_internlm2.InternLM2ForCausalLM", "AutoModelForCausalLM": "modeling_internlm2.InternLM2ForCausalLM" }, "bad_words_ids": null, "begin_suppress_tokens": null, "bias": false, "bos_token_id": 1, "chunk_size_feed_forward": 0, "cross_attention_hidden_size": null, "decoder_start_token_id": null, "diversity_penalty": 0.0, "do_sample": false, "early_stopping": false, "encoder_no_repeat_ngram_size": 0, "eos_token_id": 2, "exponential_decay_length_penalty": null, "finetuning_task": null, "forced_bos_token_id": null, "forced_eos_token_id": null, "hidden_act": "silu", "hidden_size": 6144, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "initializer_range": 0.02, "intermediate_size": 16384, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "length_penalty": 1.0, "max_length": 20, "max_position_embeddings": 32768, "min_length": 0, "model_type": "internlm2", "no_repeat_ngram_size": 0, "num_attention_heads": 48, "num_beam_groups": 1, "num_beams": 1, "num_hidden_layers": 48, "num_key_value_heads": 8, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "pad_token_id": 2, "prefix": null, "problem_type": null, "pruned_heads": {}, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 3.0, "type": "dynamic" }, "rope_theta": 1000000, "sep_token_id": null, "suppress_tokens": null, "task_specific_params": null, "temperature": 1.0, "tf_legacy_loss": false, "tie_encoder_decoder": false, "tie_word_embeddings": false, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": "bfloat16", "torchscript": false, "transformers_version": "4.37.2", "typical_p": 1.0, "use_bfloat16": true, "use_cache": true, "vocab_size": 92553 }, "max_dynamic_patch": 12, "min_dynamic_patch": 1, "model_type": "internvl_chat", "pad2square": false, "ps_version": "v2", "select_layer": -1, "system_message": "You are an AI assistant whose name is InternLM (\u4e66\u751f\u00b7\u6d66\u8bed).", "template": "internlm2-chat", "torch_dtype": "bfloat16", "transformers_version": null, "use_backbone_lora": 0, "use_llm_lora": 0, "use_thumbnail": true, "vision_config": { "_name_or_path": "", "add_cross_attention": false, "architectures": [ "InternVisionModel" ], "attention_dropout": 0.0, "bad_words_ids": null, "begin_suppress_tokens": null, "bos_token_id": null, "chunk_size_feed_forward": 0, "cross_attention_hidden_size": null, "decoder_start_token_id": null, "diversity_penalty": 0.0, "do_sample": false, "drop_path_rate": 0.0, "dropout": 0.0, "early_stopping": false, "encoder_no_repeat_ngram_size": 0, "eos_token_id": null, "exponential_decay_length_penalty": null, "finetuning_task": null, "forced_bos_token_id": null, "forced_eos_token_id": null, "hidden_act": "gelu", "hidden_size": 3200, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "image_size": 448, "initializer_factor": 0.1, "initializer_range": 1e-10, "intermediate_size": 12800, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_eps": 1e-06, "length_penalty": 1.0, "max_length": 20, "min_length": 0, "model_type": "intern_vit_6b", "no_repeat_ngram_size": 0, "norm_type": "rms_norm", "num_attention_heads": 25, "num_beam_groups": 1, "num_beams": 1, "num_channels": 3, "num_hidden_layers": 45, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "pad_token_id": null, "patch_size": 14, "prefix": null, "problem_type": null, "pruned_heads": {}, "qk_normalization": true, "qkv_bias": false, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "sep_token_id": null, "suppress_tokens": null, "task_specific_params": null, "temperature": 1.0, "tf_legacy_loss": false, "tie_encoder_decoder": false, "tie_word_embeddings": true, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": "bfloat16", "torchscript": false, "transformers_version": "4.37.2", "typical_p": 1.0, "use_bfloat16": true, "use_flash_attn": true } } 07/29/2024 11:42:32 - INFO - __main__ - Using flash_attention_2 for InternLM [INFO|modeling_utils.py:3473] 2024-07-29 11:42:32,848 >> loading weights file /data/jcy/ckpt/internvl-chat-v1-5/model.safetensors.index.json [INFO|modeling_utils.py:1426] 2024-07-29 11:42:32,848 >> Instantiating InternVLChatModel model under default dtype torch.bfloat16. [INFO|modeling_utils.py:3582] 2024-07-29 11:42:32,849 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model [INFO|configuration_utils.py:826] 2024-07-29 11:42:32,857 >> Generate config GenerationConfig {} [2024-07-29 11:42:32,922] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-29 11:42:32,927] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-29 11:42:32,929] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-29 11:42:32,929] [INFO] [comm.py:637:init_distributed] cdb=None [2024-07-29 11:42:32,929] [INFO] [comm.py:637:init_distributed] cdb=None 07/29/2024 11:42:32 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False 07/29/2024 11:42:32 - WARNING - __main__ - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: False 07/29/2024 11:42:32 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False 07/29/2024 11:42:32 - WARNING - __main__ - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, 16-bits training: False [2024-07-29 11:42:32,948] [INFO] [comm.py:637:init_distributed] cdb=None 07/29/2024 11:42:32 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: False 07/29/2024 11:42:32 - WARNING - __main__ - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: False [2024-07-29 11:42:33,004] [INFO] [comm.py:637:init_distributed] cdb=None 07/29/2024 11:42:33 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False [WARNING|logging.py:314] 2024-07-29 11:42:33,093 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-29 11:42:33,094 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-29 11:42:33,096 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-29 11:42:33,097 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-29 11:42:33,102 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-29 11:42:33,122 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-29 11:42:33,179 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:826] 2024-07-29 11:42:35,011 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 2 } [2024-07-29 11:42:35,246] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 934, num_elems = 25.51B Loading checkpoint shards: 0%| | 0/11 [00:00> All model checkpoint weights were used when initializing InternVLChatModel. [INFO|modeling_utils.py:4358] 2024-07-29 11:42:49,320 >> All the weights of InternVLChatModel were initialized from the model checkpoint at /data/jcy/ckpt/internvl-chat-v1-5. If your task is similar to the task the model of the checkpoint was trained on, you can already use InternVLChatModel for predictions without further training. [INFO|configuration_utils.py:779] 2024-07-29 11:42:49,327 >> loading configuration file /data/jcy/ckpt/internvl-chat-v1-5/generation_config.json [INFO|configuration_utils.py:826] 2024-07-29 11:42:49,328 >> Generate config GenerationConfig {} 07/29/2024 11:42:49 - INFO - __main__ - Finished 07/29/2024 11:42:49 - INFO - __main__ - model.config.force_image_size: 448 07/29/2024 11:42:49 - INFO - __main__ - data_args.force_image_size: 448 07/29/2024 11:42:49 - INFO - __main__ - model.config.vision_config.image_size: 448 07/29/2024 11:42:49 - INFO - __main__ - [Dataset] num_image_token: 256 07/29/2024 11:42:49 - INFO - __main__ - [Dataset] dynamic_image_size: True 07/29/2024 11:42:49 - INFO - __main__ - [Dataset] use_thumbnail: True 07/29/2024 11:42:49 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 07/29/2024 11:42:49 - INFO - __main__ - Formatting inputs...Skip in lazy mode [WARNING|tokenization_utils_base.py:3841] 2024-07-29 11:42:53,989 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-29 11:42:54,036 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-29 11:42:54,036 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-29 11:42:54,040 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-29 11:42:54,060 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-29 11:42:54,109 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-29 11:42:54,133 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors [WARNING|tokenization_utils_base.py:3841] 2024-07-29 11:42:54,418 >> Token indices sequence length is longer than the specified maximum sequence length for this model (4272 > 4096). Running this sequence through the model will result in indexing errors 07/29/2024 11:42:56 - INFO - __main__ - Add dataset: caption with length: 85997 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.tok_embeddings.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.0.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.0.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.0.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.0.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.0.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.0.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.0.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.1.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.1.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.1.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.1.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.1.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.1.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.1.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.2.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.2.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.2.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.2.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.2.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.2.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.2.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.3.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.3.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.3.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.3.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.3.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.3.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.3.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.4.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.4.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.4.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.4.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.4.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.4.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.4.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.5.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.5.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.5.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.5.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.5.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.5.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.5.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.6.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.6.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.6.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.6.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.6.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.6.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.6.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.7.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.7.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.7.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.7.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.7.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.7.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.7.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.8.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.8.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.8.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.8.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.8.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.8.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.8.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.9.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.9.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.9.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.9.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.9.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.9.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.9.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.10.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.10.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.10.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.10.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.10.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.10.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.10.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.11.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.11.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.11.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.11.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.11.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.11.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.11.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.12.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.12.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.12.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.12.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.12.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.12.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.12.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.13.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.13.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.13.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.13.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.13.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.13.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.13.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.14.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.14.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.14.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.14.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.14.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.14.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.14.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.15.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.15.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.15.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.15.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.15.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.15.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.15.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.16.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.16.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.16.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.16.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.16.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.16.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.16.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.17.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.17.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.17.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.17.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.17.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.17.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.17.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.18.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.18.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.18.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.18.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.18.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.18.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.18.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.19.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.19.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.19.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.19.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.19.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.19.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.19.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.20.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.20.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.20.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.20.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.20.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.20.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.20.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.21.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.21.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.21.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.21.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.21.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.21.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.21.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.22.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.22.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.22.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.22.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.22.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.22.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.22.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.23.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.23.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.23.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.23.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.23.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.23.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.23.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.24.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.24.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.24.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.24.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.24.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.24.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.24.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.25.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.25.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.25.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.25.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.25.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.25.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.25.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.26.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.26.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.26.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.26.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.26.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.26.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.26.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.27.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.27.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.27.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.27.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.27.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.27.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.27.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.28.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.28.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.28.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.28.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.28.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.28.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.28.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.29.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.29.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.29.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.29.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.29.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.29.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.29.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.30.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.30.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.30.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.30.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.30.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.30.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.30.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.31.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.31.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.31.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.31.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.31.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.31.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.31.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.32.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.32.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.32.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.32.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.32.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.32.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.32.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.33.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.33.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.33.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.33.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.33.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.33.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.33.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.34.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.34.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.34.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.34.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.34.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.34.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.34.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.35.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.35.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.35.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.35.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.35.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.35.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.35.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.36.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.36.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.36.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.36.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.36.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.36.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.36.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.37.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.37.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.37.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.37.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.37.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.37.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.37.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.38.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.38.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.38.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.38.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.38.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.38.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.38.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.39.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.39.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.39.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.39.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.39.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.39.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.39.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.40.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.40.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.40.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.40.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.40.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.40.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.40.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.41.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.41.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.41.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.41.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.41.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.41.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.41.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.42.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.42.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.42.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.42.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.42.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.42.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.42.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.43.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.43.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.43.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.43.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.43.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.43.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.43.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.44.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.44.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.44.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.44.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.44.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.44.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.44.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.45.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.45.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.45.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.45.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.45.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.45.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.45.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.46.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.46.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.46.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.46.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.46.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.46.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.46.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.47.attention.wqkv.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.47.attention.wo.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.47.feed_forward.w1.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.47.feed_forward.w3.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.47.feed_forward.w2.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.47.attention_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.layers.47.ffn_norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.model.norm.weight 07/29/2024 11:42:56 - INFO - __main__ - language_model.output.weight 07/29/2024 11:42:56 - INFO - __main__ - mlp1.0.weight 07/29/2024 11:42:56 - INFO - __main__ - mlp1.0.bias 07/29/2024 11:42:56 - INFO - __main__ - mlp1.1.weight 07/29/2024 11:42:56 - INFO - __main__ - mlp1.1.bias 07/29/2024 11:42:56 - INFO - __main__ - mlp1.3.weight 07/29/2024 11:42:56 - INFO - __main__ - mlp1.3.bias [INFO|trainer.py:571] 2024-07-29 11:42:56,577 >> Using auto half precision backend Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/jcy/.cache/torch_extensions/py312_cu121/fused_adam/build.ninja... Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... [2024-07-29 11:42:56,756] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.12.6, git-hash=unknown, git-branch=unknown Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... [2024-07-29 11:42:56,779] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False ninja: no work to do. Loading extension module fused_adam... Time to load fused_adam op: 0.14140725135803223 seconds Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/jcy/.cache/torch_extensions/py312_cu121/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... ninja: no work to do. Loading extension module fused_adam... Time to load fused_adam op: 0.13640904426574707 seconds Loading extension module fused_adam... Time to load fused_adam op: 0.2025158405303955 seconds Loading extension module fused_adam... Loading extension module fused_adam... Time to load fused_adam op: 0.30295395851135254 seconds Time to load fused_adam op: 0.20264315605163574 seconds Loading extension module fused_adam... Time to load fused_adam op: 0.10249996185302734 seconds Loading extension module fused_adam... Time to load fused_adam op: 0.30277514457702637 seconds Using /home/jcy/.cache/torch_extensions/py312_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/jcy/.cache/torch_extensions/py312_cu121/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_adam... Time to load fused_adam op: 0.13980412483215332 seconds [2024-07-29 11:42:57,334] [INFO] [logging.py:96:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2024-07-29 11:42:57,334] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer [2024-07-29 11:42:57,379] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam [2024-07-29 11:42:57,379] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2024-07-29 11:42:57,379] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False [2024-07-29 11:42:57,379] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer [2024-07-29 11:42:57,558] [INFO] [utils.py:791:see_memory_usage] Stage 3 initialize beginning [2024-07-29 11:42:57,559] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 8.81 GB CA 6.97 GB Max_CA 9 GB [2024-07-29 11:42:57,559] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 30.46 GB, percent = 3.0% [2024-07-29 11:42:57,563] [INFO] [stage3.py:127:__init__] Reduce bucket size 1000000000 [2024-07-29 11:42:57,564] [INFO] [stage3.py:128:__init__] Prefetch bucket size 1000000000 [2024-07-29 11:42:57,742] [INFO] [utils.py:791:see_memory_usage] DeepSpeedZeRoOffload initialize [begin] [2024-07-29 11:42:57,743] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 6.69 GB CA 6.97 GB Max_CA 7 GB [2024-07-29 11:42:57,743] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 30.46 GB, percent = 3.0% Parameter Offload: Total persistent parameters: 7529856 in 510 params [2024-07-29 11:42:57,980] [INFO] [utils.py:791:see_memory_usage] DeepSpeedZeRoOffload initialize [end] [2024-07-29 11:42:57,981] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 6.69 GB CA 6.97 GB Max_CA 7 GB [2024-07-29 11:42:57,981] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 30.46 GB, percent = 3.0% [2024-07-29 11:42:58,183] [INFO] [utils.py:791:see_memory_usage] Before creating fp16 partitions [2024-07-29 11:42:58,183] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 6.69 GB CA 6.97 GB Max_CA 7 GB [2024-07-29 11:42:58,183] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 30.46 GB, percent = 3.0% [2024-07-29 11:43:02,349] [INFO] [utils.py:791:see_memory_usage] After creating fp16 partitions: 3 [2024-07-29 11:43:02,350] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 6.69 GB CA 10.76 GB Max_CA 11 GB [2024-07-29 11:43:02,350] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 38.35 GB, percent = 3.8% [2024-07-29 11:43:02,540] [INFO] [utils.py:791:see_memory_usage] Before creating fp32 partitions [2024-07-29 11:43:02,541] [INFO] [utils.py:792:see_memory_usage] MA 6.69 GB Max_MA 6.69 GB CA 10.76 GB Max_CA 11 GB [2024-07-29 11:43:02,541] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 30.47 GB, percent = 3.0% [2024-07-29 11:43:02,734] [INFO] [utils.py:791:see_memory_usage] After creating fp32 partitions [2024-07-29 11:43:02,735] [INFO] [utils.py:792:see_memory_usage] MA 16.0 GB Max_MA 16.91 GB CA 21.99 GB Max_CA 22 GB [2024-07-29 11:43:02,735] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 30.47 GB, percent = 3.0% [2024-07-29 11:43:02,925] [INFO] [utils.py:791:see_memory_usage] Before initializing optimizer states [2024-07-29 11:43:02,926] [INFO] [utils.py:792:see_memory_usage] MA 16.0 GB Max_MA 16.0 GB CA 21.99 GB Max_CA 22 GB [2024-07-29 11:43:02,926] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 30.47 GB, percent = 3.0% [2024-07-29 11:43:02,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | init_optimizer_state: 57.96 [2024-07-29 11:43:03,178] [INFO] [utils.py:791:see_memory_usage] After initializing optimizer states [2024-07-29 11:43:03,178] [INFO] [utils.py:792:see_memory_usage] MA 34.6 GB Max_MA 38.35 GB CA 42.53 GB Max_CA 43 GB [2024-07-29 11:43:03,178] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 30.48 GB, percent = 3.0% [2024-07-29 11:43:03,179] [INFO] [stage3.py:479:_setup_for_real_optimizer] optimizer state initialized [2024-07-29 11:43:03,607] [INFO] [utils.py:791:see_memory_usage] After initializing ZeRO optimizer [2024-07-29 11:43:03,608] [INFO] [utils.py:792:see_memory_usage] MA 41.12 GB Max_MA 43.24 GB CA 48.24 GB Max_CA 48 GB [2024-07-29 11:43:03,608] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 30.48 GB, percent = 3.0% [2024-07-29 11:43:03,608] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw [2024-07-29 11:43:03,608] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupCosineLR [2024-07-29 11:43:03,608] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2024-07-29 11:43:03,608] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05], mom=[[0.9, 0.999]] [2024-07-29 11:43:03,611] [INFO] [config.py:984:print] DeepSpeedEngine configuration: [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] amp_enabled .................. False [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] amp_params ................... False [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] bfloat16_enabled ............. True [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] checkpoint_parallel_write_pipeline False [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] checkpoint_tag_validation_enabled True [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] checkpoint_tag_validation_fail False [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] comms_config ................. [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] communication_data_type ...... None [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] curriculum_enabled_legacy .... False [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] curriculum_params_legacy ..... False [2024-07-29 11:43:03,611] [INFO] [config.py:988:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] data_efficiency_enabled ...... False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] dataloader_drop_last ......... False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] disable_allgather ............ False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] dump_state ................... False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] dynamic_loss_scale_args ...... None [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] eigenvalue_enabled ........... False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] eigenvalue_gas_boundary_resolution 1 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] eigenvalue_layer_name ........ bert.encoder.layer [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] eigenvalue_layer_num ......... 0 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] eigenvalue_max_iter .......... 100 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] eigenvalue_stability ......... 1e-06 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] eigenvalue_tol ............... 0.01 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] eigenvalue_verbose ........... False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] elasticity_enabled ........... False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] fp16_auto_cast ............... None [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] fp16_enabled ................. False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] fp16_master_weights_and_gradients False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] global_rank .................. 0 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] grad_accum_dtype ............. None [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] gradient_accumulation_steps .. 8 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] gradient_clipping ............ 1.0 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] gradient_predivide_factor .... 1.0 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] graph_harvesting ............. False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] initial_dynamic_scale ........ 1 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] load_universal_checkpoint .... False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] loss_scale ................... 1.0 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] memory_breakdown ............. False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] mics_hierarchial_params_gather False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] mics_shard_size .............. -1 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] optimizer_legacy_fusion ...... False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] optimizer_name ............... adamw [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] optimizer_params ............. {'lr': 2e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.05} [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] pld_enabled .................. False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] pld_params ................... False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] prescale_gradients ........... False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] scheduler_name ............... WarmupCosineLR [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] scheduler_params ............. {'warmup_min_ratio': 0, 'cos_min_ratio': 0, 'warmup_num_steps': 21, 'warmup_type': 'linear', 'total_num_steps': 671} [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] seq_parallel_communication_data_type torch.float32 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] sparse_attention ............. None [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] sparse_gradients_enabled ..... False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] steps_per_print .............. inf [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] train_batch_size ............. 128 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] train_micro_batch_size_per_gpu 2 [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] use_data_before_expert_parallel_ False [2024-07-29 11:43:03,612] [INFO] [config.py:988:print] use_node_local_storage ....... False [2024-07-29 11:43:03,613] [INFO] [config.py:988:print] wall_clock_breakdown ......... True [2024-07-29 11:43:03,613] [INFO] [config.py:988:print] weight_quantization_config ... None [2024-07-29 11:43:03,613] [INFO] [config.py:988:print] world_size ................... 8 [2024-07-29 11:43:03,613] [INFO] [config.py:988:print] zero_allow_untested_optimizer False [2024-07-29 11:43:03,613] [INFO] [config.py:988:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=1000000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=1000000000 param_persistence_threshold=10000000 model_persistence_threshold=sys.maxsize max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=True stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True [2024-07-29 11:43:03,613] [INFO] [config.py:988:print] zero_enabled ................. True [2024-07-29 11:43:03,613] [INFO] [config.py:988:print] zero_force_ds_cpu_optimizer .. True [2024-07-29 11:43:03,613] [INFO] [config.py:988:print] zero_optimization_stage ...... 3 [2024-07-29 11:43:03,613] [INFO] [config.py:974:print_user_config] json = { "zero_optimization": { "stage": 3, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 1.000000e+09, "reduce_bucket_size": 1.000000e+09, "stage3_prefetch_bucket_size": 1.000000e+09, "stage3_param_persistence_threshold": 1.000000e+07, "stage3_max_live_parameters": 1.000000e+09, "stage3_max_reuse_distance": 1.000000e+09, "stage3_gather_16bit_weights_on_model_save": true }, "fp16": { "enabled": false, "auto_cast": true, "loss_scale": 0, "initial_scale_power": 32, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": true }, "optimizer": { "type": "AdamW", "params": { "lr": 2e-05, "betas": [0.9, 0.999], "eps": 1e-08, "weight_decay": 0.05 } }, "scheduler": { "type": "WarmupCosineLR", "params": { "warmup_min_ratio": 0, "cos_min_ratio": 0, "warmup_num_steps": 21, "warmup_type": "linear", "total_num_steps": 671 } }, "gradient_accumulation_steps": 8, "gradient_clipping": 1.0, "steps_per_print": inf, "train_batch_size": 128, "train_micro_batch_size_per_gpu": 2, "wall_clock_breakdown": true } [INFO|trainer.py:1721] 2024-07-29 11:43:03,613 >> ***** Running training ***** [INFO|trainer.py:1722] 2024-07-29 11:43:03,613 >> Num examples = 85,997 [INFO|trainer.py:1723] 2024-07-29 11:43:03,613 >> Num Epochs = 1 [INFO|trainer.py:1724] 2024-07-29 11:43:03,613 >> Instantaneous batch size per device = 2 [INFO|trainer.py:1727] 2024-07-29 11:43:03,613 >> Total train batch size (w. parallel, distributed & accumulation) = 128 [INFO|trainer.py:1728] 2024-07-29 11:43:03,613 >> Gradient Accumulation steps = 8 [INFO|trainer.py:1729] 2024-07-29 11:43:03,613 >> Total optimization steps = 671 [INFO|trainer.py:1730] 2024-07-29 11:43:03,617 >> Number of trainable parameters = 19,977,690,112 [INFO|integration_utils.py:722] 2024-07-29 11:43:03,620 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" wandb: Currently logged in as: darrendong (pku_kcl). Use `wandb login --relogin` to force relogin wandb: - Waiting for wandb.init()... wandb: \ Waiting for wandb.init()... wandb: wandb version 0.17.5 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.17.0 wandb: Run data is saved locally in /data/jcy/project/InternVL/internvl_chat/wandb/run-20240729_114309-8a7wdzgp wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run swept-microwave-27 wandb: ⭐️ View project at https://wandb.ai/pku_kcl/huggingface wandb: 🚀 View run at https://wandb.ai/pku_kcl/huggingface/runs/8a7wdzgp 0%| | 0/671 [00:00> Saving model checkpoint to /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200 [INFO|configuration_utils.py:473] 2024-07-29 15:36:24,650 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/config.json [INFO|configuration_utils.py:594] 2024-07-29 15:36:24,650 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/generation_config.json [INFO|modeling_utils.py:2501] 2024-07-29 15:37:16,107 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2433] 2024-07-29 15:37:16,108 >> tokenizer config file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-07-29 15:37:16,109 >> Special tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-07-29 15:37:16,109 >> added tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/added_tokens.json [2024-07-29 15:37:16,147] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step200 is about to be saved! [2024-07-29 15:37:16,706] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/global_step200/zero_pp_rank_0_mp_rank_00_model_states.pt [2024-07-29 15:37:16,707] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/global_step200/zero_pp_rank_0_mp_rank_00_model_states.pt... [2024-07-29 15:37:18,461] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/global_step200/zero_pp_rank_0_mp_rank_00_model_states.pt. [2024-07-29 15:37:18,882] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-07-29 15:38:12,789] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-07-29 15:38:12,790] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-07-29 15:38:17,808] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step200 is ready now! dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3963 [2024-07-29 15:38:26,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.71 | bwd_microstep: 5168.69 | bwd_inner_microstep: 5108.06 | bwd_allreduce_microstep: 60.57 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2258 [2024-07-29 15:38:35,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.42 | bwd_microstep: 5192.41 | bwd_inner_microstep: 4787.45 | bwd_allreduce_microstep: 404.89 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2071 [2024-07-29 15:38:44,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.57 | bwd_microstep: 5173.56 | bwd_inner_microstep: 4771.23 | bwd_allreduce_microstep: 402.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-29 15:38:52,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3201.08 | bwd_microstep: 4826.90 | bwd_inner_microstep: 4782.74 | bwd_allreduce_microstep: 44.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-29 15:39:00,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.34 | bwd_microstep: 5116.80 | bwd_inner_microstep: 5048.02 | bwd_allreduce_microstep: 68.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 15:39:09,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.92 | bwd_microstep: 5083.75 | bwd_inner_microstep: 4688.58 | bwd_allreduce_microstep: 395.10 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3732 [2024-07-29 15:39:17,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.65 | bwd_microstep: 5001.05 | bwd_inner_microstep: 4954.33 | bwd_allreduce_microstep: 46.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-29 15:39:26,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 15:39:26,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.27 | bwd_microstep: 5070.45 | bwd_inner_microstep: 5025.62 | bwd_allreduce_microstep: 44.75 | step_microstep: 181.34 [2024-07-29 15:39:26,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28027.87 | bwd: 40633.59 | bwd_inner: 39165.98 | bwd_allreduce: 1467.14 | step: 181.91 30%|██▉ | 201/671 [3:56:12<14:30:27, 111.12s/it] {'loss': 1.1685, 'learning_rate': 1.6448422127361707e-05, 'epoch': 0.3} 30%|██▉ | 201/671 [3:56:12<14:30:27, 111.12s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3956 [2024-07-29 15:39:35,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3817.87 | bwd_microstep: 5247.04 | bwd_inner_microstep: 5216.29 | bwd_allreduce_microstep: 30.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-29 15:39:44,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.36 | bwd_microstep: 5198.09 | bwd_inner_microstep: 5136.90 | bwd_allreduce_microstep: 61.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-29 15:39:53,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.26 | bwd_microstep: 5193.83 | bwd_inner_microstep: 4790.17 | bwd_allreduce_microstep: 403.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-29 15:40:01,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3029.03 | bwd_microstep: 4954.08 | bwd_inner_microstep: 4572.55 | bwd_allreduce_microstep: 381.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-29 15:40:10,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.71 | bwd_microstep: 5123.75 | bwd_inner_microstep: 5057.15 | bwd_allreduce_microstep: 66.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-29 15:40:18,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.46 | bwd_microstep: 4990.93 | bwd_inner_microstep: 4938.48 | bwd_allreduce_microstep: 52.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2139 [2024-07-29 15:40:27,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.54 | bwd_microstep: 5113.04 | bwd_inner_microstep: 4715.73 | bwd_allreduce_microstep: 397.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-07-29 15:40:36,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 15:40:36,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.14 | bwd_microstep: 4870.43 | bwd_inner_microstep: 4851.01 | bwd_allreduce_microstep: 19.34 | step_microstep: 182.42 [2024-07-29 15:40:36,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28305.29 | bwd: 40691.17 | bwd_inner: 39278.22 | bwd_allreduce: 1412.48 | step: 182.99 30%|███ | 202/671 [3:57:22<12:50:35, 98.58s/it] {'loss': 1.2085, 'learning_rate': 1.64114058975328e-05, 'epoch': 0.3} 30%|███ | 202/671 [3:57:22<12:50:35, 98.58s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3884 [2024-07-29 15:40:45,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3802.09 | bwd_microstep: 5177.55 | bwd_inner_microstep: 5158.42 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2098 [2024-07-29 15:40:53,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.39 | bwd_microstep: 5230.45 | bwd_inner_microstep: 4825.13 | bwd_allreduce_microstep: 405.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2236 [2024-07-29 15:41:02,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.61 | bwd_microstep: 5213.42 | bwd_inner_microstep: 4809.57 | bwd_allreduce_microstep: 403.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2219 [2024-07-29 15:41:11,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.43 | bwd_microstep: 5224.97 | bwd_inner_microstep: 4821.10 | bwd_allreduce_microstep: 403.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-29 15:41:19,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.66 | bwd_microstep: 4822.50 | bwd_inner_microstep: 4780.91 | bwd_allreduce_microstep: 41.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-29 15:41:27,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3010.44 | bwd_microstep: 4892.35 | bwd_inner_microstep: 4518.26 | bwd_allreduce_microstep: 374.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-29 15:41:36,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.35 | bwd_microstep: 5105.44 | bwd_inner_microstep: 4709.19 | bwd_allreduce_microstep: 396.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-29 15:41:44,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 15:41:44,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.74 | bwd_microstep: 4722.59 | bwd_inner_microstep: 4697.20 | bwd_allreduce_microstep: 25.32 | step_microstep: 180.92 [2024-07-29 15:41:44,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27424.63 | bwd: 40389.27 | bwd_inner: 38319.73 | bwd_allreduce: 2069.06 | step: 181.49 30%|███ | 203/671 [3:58:30<11:37:42, 89.45s/it] {'loss': 1.2296, 'learning_rate': 1.63742398974869e-05, 'epoch': 0.3} 30%|███ | 203/671 [3:58:30<11:37:42, 89.45s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2142 [2024-07-29 15:41:53,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.38 | bwd_microstep: 5243.75 | bwd_inner_microstep: 4838.61 | bwd_allreduce_microstep: 405.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3584 [2024-07-29 15:42:01,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.66 | bwd_microstep: 5137.30 | bwd_inner_microstep: 5057.05 | bwd_allreduce_microstep: 80.19 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-29 15:42:10,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.66 | bwd_microstep: 5158.76 | bwd_inner_microstep: 5106.51 | bwd_allreduce_microstep: 52.19 | step_microstep: 0.07 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3599 [2024-07-29 15:42:18,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3147.82 | bwd_microstep: 4971.05 | bwd_inner_microstep: 4908.88 | bwd_allreduce_microstep: 62.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-29 15:42:27,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.34 | bwd_microstep: 5161.95 | bwd_inner_microstep: 5083.99 | bwd_allreduce_microstep: 77.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 15:42:36,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.70 | bwd_microstep: 5218.97 | bwd_inner_microstep: 4814.96 | bwd_allreduce_microstep: 403.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3678 [2024-07-29 15:42:44,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.72 | bwd_microstep: 4862.31 | bwd_inner_microstep: 4842.93 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3681 [2024-07-29 15:42:53,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-29 15:42:53,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.50 | bwd_microstep: 4895.44 | bwd_inner_microstep: 4874.66 | bwd_allreduce_microstep: 20.71 | step_microstep: 181.14 [2024-07-29 15:42:53,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28449.69 | bwd: 40649.53 | bwd_inner: 39527.52 | bwd_allreduce: 1121.54 | step: 181.71 30%|███ | 204/671 [3:59:39<10:49:27, 83.44s/it] {'loss': 1.1602, 'learning_rate': 1.6336924995420453e-05, 'epoch': 0.3} 30%|███ | 204/671 [3:59:39<10:49:27, 83.44s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3992 [2024-07-29 15:43:02,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3890.93 | bwd_microstep: 5329.99 | bwd_inner_microstep: 5299.85 | bwd_allreduce_microstep: 30.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-29 15:43:11,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3658.58 | bwd_microstep: 5293.80 | bwd_inner_microstep: 5196.71 | bwd_allreduce_microstep: 97.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3806 [2024-07-29 15:43:20,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3233.84 | bwd_microstep: 4837.99 | bwd_inner_microstep: 4818.61 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-07-29 15:43:28,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.45 | bwd_microstep: 5171.30 | bwd_inner_microstep: 4768.22 | bwd_allreduce_microstep: 403.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-29 15:43:36,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.87 | bwd_microstep: 4804.62 | bwd_inner_microstep: 4766.81 | bwd_allreduce_microstep: 37.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-29 15:43:45,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.47 | bwd_microstep: 5012.43 | bwd_inner_microstep: 4993.05 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 15:43:54,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.36 | bwd_microstep: 5153.60 | bwd_inner_microstep: 5082.13 | bwd_allreduce_microstep: 71.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-29 15:44:03,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-29 15:44:03,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.65 | bwd_microstep: 4987.90 | bwd_inner_microstep: 4968.50 | bwd_allreduce_microstep: 19.33 | step_microstep: 180.83 [2024-07-29 15:44:03,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28564.06 | bwd: 40591.60 | bwd_inner: 39893.84 | bwd_allreduce: 697.29 | step: 181.40 31%|███ | 205/671 [4:00:49<10:15:40, 79.27s/it] {'loss': 1.26, 'learning_rate': 1.6299462063008272e-05, 'epoch': 0.31} 31%|███ | 205/671 [4:00:49<10:15:40, 79.27s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2353 [2024-07-29 15:44:12,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.61 | bwd_microstep: 5299.43 | bwd_inner_microstep: 4892.17 | bwd_allreduce_microstep: 407.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3848 [2024-07-29 15:44:21,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.03 | bwd_microstep: 5191.51 | bwd_inner_microstep: 5136.29 | bwd_allreduce_microstep: 55.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2218 [2024-07-29 15:44:29,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.97 | bwd_microstep: 5222.53 | bwd_inner_microstep: 4816.81 | bwd_allreduce_microstep: 405.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-29 15:44:37,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.83 | bwd_microstep: 4824.43 | bwd_inner_microstep: 4800.95 | bwd_allreduce_microstep: 23.41 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2172 [2024-07-29 15:44:46,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.74 | bwd_microstep: 5149.74 | bwd_inner_microstep: 4748.73 | bwd_allreduce_microstep: 400.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-29 15:44:55,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.46 | bwd_microstep: 5110.05 | bwd_inner_microstep: 4713.56 | bwd_allreduce_microstep: 396.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-29 15:45:03,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3221.85 | bwd_microstep: 4832.53 | bwd_inner_microstep: 4790.68 | bwd_allreduce_microstep: 41.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-29 15:45:12,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 15:45:12,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.80 | bwd_microstep: 5009.21 | bwd_inner_microstep: 4954.20 | bwd_allreduce_microstep: 54.95 | step_microstep: 181.30 [2024-07-29 15:45:12,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27819.20 | bwd: 40639.42 | bwd_inner: 38853.33 | bwd_allreduce: 1785.63 | step: 181.87 31%|███ | 206/671 [4:01:57<9:49:58, 76.13s/it] {'loss': 1.1697, 'learning_rate': 1.626185197538314e-05, 'epoch': 0.31} 31%|███ | 206/671 [4:01:57<9:49:58, 76.13s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3910 [2024-07-29 15:45:20,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3290.12 | bwd_microstep: 4956.89 | bwd_inner_microstep: 4935.65 | bwd_allreduce_microstep: 21.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3810 [2024-07-29 15:45:29,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.31 | bwd_microstep: 5234.68 | bwd_inner_microstep: 5179.17 | bwd_allreduce_microstep: 55.44 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2288 [2024-07-29 15:45:37,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.64 | bwd_microstep: 5165.57 | bwd_inner_microstep: 4764.07 | bwd_allreduce_microstep: 401.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-29 15:45:46,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.47 | bwd_microstep: 5106.08 | bwd_inner_microstep: 5040.43 | bwd_allreduce_microstep: 65.58 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-29 15:45:55,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.66 | bwd_microstep: 5124.77 | bwd_inner_microstep: 4725.58 | bwd_allreduce_microstep: 399.12 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2227 [2024-07-29 15:46:04,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.19 | bwd_microstep: 5240.60 | bwd_inner_microstep: 4834.01 | bwd_allreduce_microstep: 406.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-29 15:46:12,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.05 | bwd_microstep: 5008.57 | bwd_inner_microstep: 4956.65 | bwd_allreduce_microstep: 51.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-29 15:46:21,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-29 15:46:21,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.20 | bwd_microstep: 5003.41 | bwd_inner_microstep: 4951.92 | bwd_allreduce_microstep: 51.42 | step_microstep: 181.12 [2024-07-29 15:46:21,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28133.52 | bwd: 40840.55 | bwd_inner: 39387.42 | bwd_allreduce: 1452.66 | step: 181.70 31%|███ | 207/671 [4:03:07<9:32:52, 74.08s/it] {'loss': 1.2056, 'learning_rate': 1.6224095611115385e-05, 'epoch': 0.31} 31%|███ | 207/671 [4:03:07<9:32:52, 74.08s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3922 [2024-07-29 15:46:30,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.80 | bwd_microstep: 5224.49 | bwd_inner_microstep: 5185.68 | bwd_allreduce_microstep: 38.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3572 [2024-07-29 15:46:39,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.64 | bwd_microstep: 5215.18 | bwd_inner_microstep: 5120.10 | bwd_allreduce_microstep: 95.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3800 [2024-07-29 15:46:47,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3781.21 | bwd_microstep: 5053.61 | bwd_inner_microstep: 5029.55 | bwd_allreduce_microstep: 24.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-29 15:46:56,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.76 | bwd_microstep: 5237.27 | bwd_inner_microstep: 4829.92 | bwd_allreduce_microstep: 407.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-29 15:47:05,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.87 | bwd_microstep: 5126.91 | bwd_inner_microstep: 4729.54 | bwd_allreduce_microstep: 397.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-29 15:47:13,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3201.48 | bwd_microstep: 4742.98 | bwd_inner_microstep: 4717.54 | bwd_allreduce_microstep: 25.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-29 15:47:22,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.61 | bwd_microstep: 5169.56 | bwd_inner_microstep: 4768.78 | bwd_allreduce_microstep: 400.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3672 [2024-07-29 15:47:30,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-29 15:47:30,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.36 | bwd_microstep: 5056.06 | bwd_inner_microstep: 4980.49 | bwd_allreduce_microstep: 75.50 | step_microstep: 180.99 [2024-07-29 15:47:30,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28474.65 | bwd: 40826.04 | bwd_inner: 39361.53 | bwd_allreduce: 1464.05 | step: 181.56 31%|███ | 208/671 [4:04:16<9:21:20, 72.74s/it] {'loss': 1.205, 'learning_rate': 1.6186193852192356e-05, 'epoch': 0.31} 31%|███ | 208/671 [4:04:16<9:21:20, 72.74s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3633 [2024-07-29 15:47:39,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3172.48 | bwd_microstep: 5131.46 | bwd_inner_microstep: 5049.85 | bwd_allreduce_microstep: 81.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3823 [2024-07-29 15:47:48,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.43 | bwd_microstep: 5103.53 | bwd_inner_microstep: 5078.82 | bwd_allreduce_microstep: 24.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2270 [2024-07-29 15:47:56,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.85 | bwd_microstep: 5121.32 | bwd_inner_microstep: 4722.29 | bwd_allreduce_microstep: 398.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-29 15:48:05,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.02 | bwd_microstep: 5168.41 | bwd_inner_microstep: 5110.89 | bwd_allreduce_microstep: 57.46 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3654 [2024-07-29 15:48:14,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.55 | bwd_microstep: 5008.45 | bwd_inner_microstep: 4939.30 | bwd_allreduce_microstep: 69.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-29 15:48:22,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.13 | bwd_microstep: 5062.96 | bwd_inner_microstep: 5002.58 | bwd_allreduce_microstep: 60.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-29 15:48:31,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.70 | bwd_microstep: 4965.63 | bwd_inner_microstep: 4931.83 | bwd_allreduce_microstep: 33.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 15:48:40,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-29 15:48:40,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.86 | bwd_microstep: 4999.34 | bwd_inner_microstep: 4946.67 | bwd_allreduce_microstep: 52.60 | step_microstep: 180.87 [2024-07-29 15:48:40,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28459.92 | bwd: 40561.09 | bwd_inner: 39782.18 | bwd_allreduce: 778.45 | step: 181.44 31%|███ | 209/671 [4:05:26<9:12:17, 71.73s/it] {'loss': 1.1565, 'learning_rate': 1.6148147583997813e-05, 'epoch': 0.31} 31%|███ | 209/671 [4:05:26<9:12:17, 71.73s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3819 [2024-07-29 15:48:49,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.71 | bwd_microstep: 5086.98 | bwd_inner_microstep: 5067.90 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3857 [2024-07-29 15:48:58,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3785.25 | bwd_microstep: 5130.59 | bwd_inner_microstep: 5111.23 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2276 [2024-07-29 15:49:06,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.59 | bwd_microstep: 5185.33 | bwd_inner_microstep: 4782.92 | bwd_allreduce_microstep: 402.34 | step_microstep: 0.09 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2099 [2024-07-29 15:49:15,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.49 | bwd_microstep: 5209.75 | bwd_inner_microstep: 4804.95 | bwd_allreduce_microstep: 404.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-29 15:49:24,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.76 | bwd_microstep: 4997.66 | bwd_inner_microstep: 4944.66 | bwd_allreduce_microstep: 52.93 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3715 [2024-07-29 15:49:32,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3095.97 | bwd_microstep: 4902.76 | bwd_inner_microstep: 4863.56 | bwd_allreduce_microstep: 39.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3717 [2024-07-29 15:49:40,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.06 | bwd_microstep: 4991.34 | bwd_inner_microstep: 4936.06 | bwd_allreduce_microstep: 55.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-29 15:49:48,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 15:49:48,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3192.16 | bwd_microstep: 4690.27 | bwd_inner_microstep: 4667.94 | bwd_allreduce_microstep: 22.25 | step_microstep: 180.74 [2024-07-29 15:49:48,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27972.90 | bwd: 40194.66 | bwd_inner: 39179.17 | bwd_allreduce: 1015.02 | step: 181.32 31%|███▏ | 210/671 [4:06:34<9:03:38, 70.76s/it] {'loss': 1.1724, 'learning_rate': 1.6109957695291246e-05, 'epoch': 0.31} 31%|███▏ | 210/671 [4:06:34<9:03:38, 70.76s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3942 [2024-07-29 15:50:00,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 6363.31 | bwd_microstep: 5187.40 | bwd_inner_microstep: 5129.46 | bwd_allreduce_microstep: 57.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3564 [2024-07-29 15:50:09,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.59 | bwd_microstep: 5153.84 | bwd_inner_microstep: 5050.26 | bwd_allreduce_microstep: 103.52 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2334 [2024-07-29 15:50:18,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.12 | bwd_microstep: 5310.58 | bwd_inner_microstep: 4899.82 | bwd_allreduce_microstep: 410.69 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2180 [2024-07-29 15:50:26,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.68 | bwd_microstep: 5146.41 | bwd_inner_microstep: 4745.72 | bwd_allreduce_microstep: 400.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-29 15:50:35,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.05 | bwd_microstep: 5117.65 | bwd_inner_microstep: 5044.49 | bwd_allreduce_microstep: 73.08 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3770 [2024-07-29 15:50:43,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.05 | bwd_microstep: 4954.85 | bwd_inner_microstep: 4925.94 | bwd_allreduce_microstep: 28.85 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3726 [2024-07-29 15:50:52,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.60 | bwd_microstep: 5033.14 | bwd_inner_microstep: 5004.42 | bwd_allreduce_microstep: 28.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-29 15:51:01,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-29 15:51:01,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.40 | bwd_microstep: 4995.33 | bwd_inner_microstep: 4975.86 | bwd_allreduce_microstep: 19.40 | step_microstep: 181.94 [2024-07-29 15:51:01,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 31652.70 | bwd: 40899.19 | bwd_inner: 39775.91 | bwd_allreduce: 1122.80 | step: 182.53 31%|███▏ | 211/671 [4:07:47<9:07:20, 71.39s/it] {'loss': 1.1426, 'learning_rate': 1.6071625078187113e-05, 'epoch': 0.31} 31%|███▏ | 211/671 [4:07:47<9:07:20, 71.39s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3867 [2024-07-29 15:51:10,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.14 | bwd_microstep: 5270.48 | bwd_inner_microstep: 5213.97 | bwd_allreduce_microstep: 56.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-29 15:51:18,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3235.15 | bwd_microstep: 4864.61 | bwd_inner_microstep: 4845.25 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2270 [2024-07-29 15:51:27,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.91 | bwd_microstep: 5196.95 | bwd_inner_microstep: 4792.42 | bwd_allreduce_microstep: 404.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-07-29 15:51:36,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.93 | bwd_microstep: 4942.48 | bwd_inner_microstep: 4911.40 | bwd_allreduce_microstep: 31.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-29 15:51:44,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.74 | bwd_microstep: 5149.81 | bwd_inner_microstep: 5074.67 | bwd_allreduce_microstep: 75.07 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-29 15:51:53,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.04 | bwd_microstep: 5146.08 | bwd_inner_microstep: 4744.37 | bwd_allreduce_microstep: 401.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-29 15:52:02,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.68 | bwd_microstep: 4996.31 | bwd_inner_microstep: 4942.21 | bwd_allreduce_microstep: 54.03 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3669 [2024-07-29 15:52:10,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 15:52:10,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.12 | bwd_microstep: 5015.24 | bwd_inner_microstep: 4938.83 | bwd_allreduce_microstep: 76.34 | step_microstep: 181.70 [2024-07-29 15:52:10,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28240.60 | bwd: 40581.93 | bwd_inner: 39463.06 | bwd_allreduce: 1118.39 | step: 182.39 32%|███▏ | 212/671 [4:08:56<9:01:00, 70.72s/it] {'loss': 1.2175, 'learning_rate': 1.603315062813401e-05, 'epoch': 0.32} 32%|███▏ | 212/671 [4:08:56<9:01:00, 70.72s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3854 [2024-07-29 15:52:19,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.18 | bwd_microstep: 5041.06 | bwd_inner_microstep: 5015.64 | bwd_allreduce_microstep: 25.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3831 [2024-07-29 15:52:28,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.85 | bwd_microstep: 5057.03 | bwd_inner_microstep: 5023.17 | bwd_allreduce_microstep: 33.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3776 [2024-07-29 15:52:37,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.56 | bwd_microstep: 5205.51 | bwd_inner_microstep: 5149.84 | bwd_allreduce_microstep: 55.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-29 15:52:45,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.05 | bwd_microstep: 5173.75 | bwd_inner_microstep: 5120.49 | bwd_allreduce_microstep: 53.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-29 15:52:53,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3233.84 | bwd_microstep: 4828.34 | bwd_inner_microstep: 4787.38 | bwd_allreduce_microstep: 40.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-29 15:53:02,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.08 | bwd_microstep: 5178.70 | bwd_inner_microstep: 5103.57 | bwd_allreduce_microstep: 75.07 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2170 [2024-07-29 15:53:11,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.31 | bwd_microstep: 5098.88 | bwd_inner_microstep: 4703.96 | bwd_allreduce_microstep: 394.86 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-29 15:53:20,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.83 [2024-07-29 15:53:20,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.51 | bwd_microstep: 4880.29 | bwd_inner_microstep: 4861.02 | bwd_allreduce_microstep: 19.20 | step_microstep: 181.22 [2024-07-29 15:53:20,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28587.26 | bwd: 40463.54 | bwd_inner: 39765.02 | bwd_allreduce: 698.05 | step: 181.81 32%|███▏ | 213/671 [4:10:06<8:56:46, 70.32s/it] {'loss': 1.205, 'learning_rate': 1.5994535243893742e-05, 'epoch': 0.32} 32%|███▏ | 213/671 [4:10:06<8:56:46, 70.32s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3848 [2024-07-29 15:53:29,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.03 | bwd_microstep: 5187.69 | bwd_inner_microstep: 5118.41 | bwd_allreduce_microstep: 69.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3868 [2024-07-29 15:53:37,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3784.58 | bwd_microstep: 5109.54 | bwd_inner_microstep: 5090.17 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2255 [2024-07-29 15:53:46,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.11 | bwd_microstep: 5199.20 | bwd_inner_microstep: 4795.19 | bwd_allreduce_microstep: 403.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-29 15:53:55,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.80 | bwd_microstep: 5014.19 | bwd_inner_microstep: 4994.74 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-07-29 15:54:04,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.58 | bwd_microstep: 5193.66 | bwd_inner_microstep: 5137.07 | bwd_allreduce_microstep: 56.53 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3650 [2024-07-29 15:54:13,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.53 | bwd_microstep: 5168.22 | bwd_inner_microstep: 5074.57 | bwd_allreduce_microstep: 93.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-29 15:54:21,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.95 | bwd_microstep: 5206.98 | bwd_inner_microstep: 5132.57 | bwd_allreduce_microstep: 74.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-29 15:54:30,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 15:54:30,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.20 | bwd_microstep: 5013.73 | bwd_inner_microstep: 4994.27 | bwd_allreduce_microstep: 19.39 | step_microstep: 182.12 [2024-07-29 15:54:30,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29221.69 | bwd: 41093.19 | bwd_inner: 40336.94 | bwd_allreduce: 755.77 | step: 182.71 32%|███▏ | 214/671 [4:11:16<8:56:20, 70.42s/it] {'loss': 1.2054, 'learning_rate': 1.5955779827520327e-05, 'epoch': 0.32} 32%|███▏ | 214/671 [4:11:16<8:56:20, 70.42s/it]dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 840 [2024-07-29 15:54:39,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.17 | bwd_microstep: 5459.23 | bwd_inner_microstep: 5037.73 | bwd_allreduce_microstep: 421.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2256 [2024-07-29 15:54:48,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.76 | bwd_microstep: 5249.80 | bwd_inner_microstep: 4841.77 | bwd_allreduce_microstep: 407.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2245 [2024-07-29 15:54:57,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.94 | bwd_microstep: 5160.76 | bwd_inner_microstep: 4759.10 | bwd_allreduce_microstep: 401.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-07-29 15:55:06,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.14 | bwd_microstep: 5284.71 | bwd_inner_microstep: 4874.02 | bwd_allreduce_microstep: 410.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-29 15:55:14,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3196.21 | bwd_microstep: 4775.04 | bwd_inner_microstep: 4736.05 | bwd_allreduce_microstep: 38.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-29 15:55:23,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.59 | bwd_microstep: 5162.12 | bwd_inner_microstep: 5085.21 | bwd_allreduce_microstep: 76.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-29 15:55:31,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.49 | bwd_microstep: 4886.47 | bwd_inner_microstep: 4867.12 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-29 15:55:40,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-29 15:55:40,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.99 | bwd_microstep: 4937.55 | bwd_inner_microstep: 4912.90 | bwd_allreduce_microstep: 24.58 | step_microstep: 181.05 [2024-07-29 15:55:40,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28478.20 | bwd: 40915.65 | bwd_inner: 39113.85 | bwd_allreduce: 1801.33 | step: 181.62 32%|███▏ | 215/671 [4:12:26<8:53:34, 70.21s/it] {'loss': 1.2056, 'learning_rate': 1.5916885284338937e-05, 'epoch': 0.32} 32%|███▏ | 215/671 [4:12:26<8:53:34, 70.21s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3947 [2024-07-29 15:55:48,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3341.79 | bwd_microstep: 4993.58 | bwd_inner_microstep: 4970.06 | bwd_allreduce_microstep: 23.45 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2064 [2024-07-29 15:55:57,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.15 | bwd_microstep: 5277.10 | bwd_inner_microstep: 4867.12 | bwd_allreduce_microstep: 409.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-29 15:56:06,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.75 | bwd_microstep: 5167.73 | bwd_inner_microstep: 5088.03 | bwd_allreduce_microstep: 79.63 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2115 [2024-07-29 15:56:14,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3047.08 | bwd_microstep: 5034.82 | bwd_inner_microstep: 4649.17 | bwd_allreduce_microstep: 385.58 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-29 15:56:23,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.33 | bwd_microstep: 4971.15 | bwd_inner_microstep: 4951.88 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2139 [2024-07-29 15:56:31,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3007.44 | bwd_microstep: 4911.56 | bwd_inner_microstep: 4537.30 | bwd_allreduce_microstep: 374.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-29 15:56:39,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.73 | bwd_microstep: 4996.37 | bwd_inner_microstep: 4944.77 | bwd_allreduce_microstep: 51.53 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-29 15:56:48,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 15:56:48,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.23 | bwd_microstep: 4981.35 | bwd_inner_microstep: 4932.91 | bwd_allreduce_microstep: 48.37 | step_microstep: 182.16 [2024-07-29 15:56:48,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27326.42 | bwd: 40333.63 | bwd_inner: 38941.17 | bwd_allreduce: 1391.97 | step: 182.73 32%|███▏ | 216/671 [4:13:34<8:47:21, 69.54s/it] {'loss': 1.1235, 'learning_rate': 1.5877852522924733e-05, 'epoch': 0.32} 32%|███▏ | 216/671 [4:13:34<8:47:21, 69.54s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2071 [2024-07-29 15:56:56,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3085.44 | bwd_microstep: 5135.90 | bwd_inner_microstep: 4742.51 | bwd_allreduce_microstep: 393.33 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3777 [2024-07-29 15:57:05,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.38 | bwd_microstep: 5206.81 | bwd_inner_microstep: 5150.31 | bwd_allreduce_microstep: 56.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-29 15:57:14,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.61 | bwd_microstep: 5191.14 | bwd_inner_microstep: 5107.61 | bwd_allreduce_microstep: 83.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-29 15:57:23,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.04 | bwd_microstep: 5187.97 | bwd_inner_microstep: 5130.29 | bwd_allreduce_microstep: 57.61 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-29 15:57:32,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.22 | bwd_microstep: 5198.22 | bwd_inner_microstep: 5119.44 | bwd_allreduce_microstep: 78.71 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2179 [2024-07-29 15:57:41,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3814.29 | bwd_microstep: 5207.87 | bwd_inner_microstep: 4800.34 | bwd_allreduce_microstep: 407.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-29 15:57:49,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.45 | bwd_microstep: 4975.56 | bwd_inner_microstep: 4942.85 | bwd_allreduce_microstep: 32.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-29 15:57:58,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-29 15:57:58,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.94 | bwd_microstep: 4888.52 | bwd_inner_microstep: 4869.08 | bwd_allreduce_microstep: 19.38 | step_microstep: 181.12 [2024-07-29 15:57:58,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28762.29 | bwd: 40991.98 | bwd_inner: 39862.37 | bwd_allreduce: 1129.14 | step: 181.81 32%|███▏ | 217/671 [4:14:44<8:47:25, 69.70s/it] {'loss': 1.189, 'learning_rate': 1.5838682455081657e-05, 'epoch': 0.32} 32%|███▏ | 217/671 [4:14:44<8:47:25, 69.70s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-29 15:58:07,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.93 | bwd_microstep: 5142.46 | bwd_inner_microstep: 5074.99 | bwd_allreduce_microstep: 67.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2225 [2024-07-29 15:58:15,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3377.31 | bwd_microstep: 5193.21 | bwd_inner_microstep: 4789.86 | bwd_allreduce_microstep: 403.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 15:58:24,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.22 | bwd_microstep: 4997.99 | bwd_inner_microstep: 4975.96 | bwd_allreduce_microstep: 21.97 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-29 15:58:32,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.14 | bwd_microstep: 4831.85 | bwd_inner_microstep: 4791.57 | bwd_allreduce_microstep: 40.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 15:58:41,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.67 | bwd_microstep: 5049.36 | bwd_inner_microstep: 4991.51 | bwd_allreduce_microstep: 57.78 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-29 15:58:50,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.87 | bwd_microstep: 5015.22 | bwd_inner_microstep: 4991.91 | bwd_allreduce_microstep: 23.24 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-29 15:58:58,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.55 | bwd_microstep: 5067.64 | bwd_inner_microstep: 5007.85 | bwd_allreduce_microstep: 59.72 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2110 [2024-07-29 15:59:06,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 15:59:06,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3025.47 | bwd_microstep: 4913.70 | bwd_inner_microstep: 4537.18 | bwd_allreduce_microstep: 376.46 | step_microstep: 182.34 [2024-07-29 15:59:06,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27697.05 | bwd: 40211.41 | bwd_inner: 39160.77 | bwd_allreduce: 1050.16 | step: 182.95 32%|███▏ | 218/671 [4:15:52<8:42:56, 69.26s/it] {'loss': 1.2081, 'learning_rate': 1.5799375995821116e-05, 'epoch': 0.32} 32%|███▏ | 218/671 [4:15:52<8:42:56, 69.26s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2261 [2024-07-29 15:59:15,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.03 | bwd_microstep: 5195.02 | bwd_inner_microstep: 4795.16 | bwd_allreduce_microstep: 399.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3854 [2024-07-29 15:59:24,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.34 | bwd_microstep: 5071.04 | bwd_inner_microstep: 5033.35 | bwd_allreduce_microstep: 37.62 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3789 [2024-07-29 15:59:33,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.18 | bwd_microstep: 5025.50 | bwd_inner_microstep: 5006.13 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3650 [2024-07-29 15:59:41,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.48 | bwd_microstep: 5218.67 | bwd_inner_microstep: 5128.24 | bwd_allreduce_microstep: 90.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-29 15:59:50,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.11 | bwd_microstep: 5176.84 | bwd_inner_microstep: 5102.67 | bwd_allreduce_microstep: 74.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-29 15:59:59,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.43 | bwd_microstep: 5159.62 | bwd_inner_microstep: 5087.14 | bwd_allreduce_microstep: 72.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3674 [2024-07-29 16:00:08,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.94 | bwd_microstep: 5178.94 | bwd_inner_microstep: 5090.73 | bwd_allreduce_microstep: 88.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-29 16:00:17,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-29 16:00:17,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.52 | bwd_microstep: 5101.16 | bwd_inner_microstep: 5036.02 | bwd_allreduce_microstep: 65.08 | step_microstep: 181.53 [2024-07-29 16:00:17,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28937.91 | bwd: 41126.77 | bwd_inner: 40279.38 | bwd_allreduce: 846.93 | step: 182.12 33%|███▎ | 219/671 [4:17:03<8:44:20, 69.60s/it] {'loss': 1.2584, 'learning_rate': 1.5759934063340627e-05, 'epoch': 0.33} 33%|███▎ | 219/671 [4:17:03<8:44:20, 69.60s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2377 [2024-07-29 16:00:26,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.44 | bwd_microstep: 5231.63 | bwd_inner_microstep: 4827.40 | bwd_allreduce_microstep: 404.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3580 [2024-07-29 16:00:35,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.00 | bwd_microstep: 5305.03 | bwd_inner_microstep: 5200.91 | bwd_allreduce_microstep: 104.05 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3777 [2024-07-29 16:00:44,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3888.04 | bwd_microstep: 5194.50 | bwd_inner_microstep: 5138.03 | bwd_allreduce_microstep: 56.41 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3788 [2024-07-29 16:00:52,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.10 | bwd_microstep: 4836.19 | bwd_inner_microstep: 4816.87 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-29 16:01:00,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3029.61 | bwd_microstep: 4985.70 | bwd_inner_microstep: 4599.87 | bwd_allreduce_microstep: 385.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-29 16:01:09,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.69 | bwd_microstep: 5188.62 | bwd_inner_microstep: 4782.62 | bwd_allreduce_microstep: 405.93 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-29 16:01:17,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.42 | bwd_microstep: 5134.43 | bwd_inner_microstep: 4734.98 | bwd_allreduce_microstep: 399.38 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3708 [2024-07-29 16:01:26,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 16:01:26,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.24 | bwd_microstep: 5396.48 | bwd_inner_microstep: 5146.40 | bwd_allreduce_microstep: 250.01 | step_microstep: 180.74 [2024-07-29 16:01:26,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27932.43 | bwd: 41272.56 | bwd_inner: 39247.02 | bwd_allreduce: 2025.07 | step: 181.45 33%|███▎ | 220/671 [4:18:12<8:43:01, 69.58s/it] {'loss': 1.2219, 'learning_rate': 1.5720357579002346e-05, 'epoch': 0.33} 33%|███▎ | 220/671 [4:18:12<8:43:01, 69.58s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3833 [2024-07-29 16:01:35,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.55 | bwd_microstep: 5154.12 | bwd_inner_microstep: 5113.88 | bwd_allreduce_microstep: 40.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-29 16:01:44,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.36 | bwd_microstep: 5296.35 | bwd_inner_microstep: 5198.60 | bwd_allreduce_microstep: 97.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3825 [2024-07-29 16:01:53,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.39 | bwd_microstep: 5080.89 | bwd_inner_microstep: 5055.52 | bwd_allreduce_microstep: 25.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-29 16:02:02,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.10 | bwd_microstep: 5100.33 | bwd_inner_microstep: 5054.49 | bwd_allreduce_microstep: 45.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-29 16:02:10,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.48 | bwd_microstep: 5179.41 | bwd_inner_microstep: 5121.79 | bwd_allreduce_microstep: 57.55 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3685 [2024-07-29 16:02:19,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3146.20 | bwd_microstep: 4929.35 | bwd_inner_microstep: 4880.18 | bwd_allreduce_microstep: 49.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2147 [2024-07-29 16:02:27,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.04 | bwd_microstep: 5214.30 | bwd_inner_microstep: 4809.91 | bwd_allreduce_microstep: 404.33 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-29 16:02:36,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 16:02:36,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.38 | bwd_microstep: 5018.57 | bwd_inner_microstep: 4964.29 | bwd_allreduce_microstep: 54.22 | step_microstep: 181.60 [2024-07-29 16:02:36,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28501.40 | bwd: 40973.30 | bwd_inner: 40198.59 | bwd_allreduce: 774.24 | step: 182.19 33%|███▎ | 221/671 [4:19:22<8:42:21, 69.65s/it] {'loss': 1.1252, 'learning_rate': 1.568064746731156e-05, 'epoch': 0.33} 33%|███▎ | 221/671 [4:19:22<8:42:21, 69.65s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4001 [2024-07-29 16:02:45,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.58 | bwd_microstep: 5282.84 | bwd_inner_microstep: 5244.92 | bwd_allreduce_microstep: 37.86 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3584 [2024-07-29 16:02:54,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.28 | bwd_microstep: 5184.40 | bwd_inner_microstep: 5074.20 | bwd_allreduce_microstep: 110.14 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-29 16:03:03,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.17 | bwd_microstep: 4993.24 | bwd_inner_microstep: 4973.94 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-29 16:03:11,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.66 | bwd_microstep: 5029.58 | bwd_inner_microstep: 5010.29 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-29 16:03:20,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.33 | bwd_microstep: 5209.93 | bwd_inner_microstep: 4806.70 | bwd_allreduce_microstep: 403.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-29 16:03:29,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.64 | bwd_microstep: 5041.61 | bwd_inner_microstep: 5014.54 | bwd_allreduce_microstep: 27.01 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-29 16:03:37,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3205.67 | bwd_microstep: 4765.20 | bwd_inner_microstep: 4733.07 | bwd_allreduce_microstep: 32.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-29 16:03:45,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 16:03:45,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3181.00 | bwd_microstep: 4736.48 | bwd_inner_microstep: 4706.90 | bwd_allreduce_microstep: 29.52 | step_microstep: 183.19 [2024-07-29 16:03:45,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28472.23 | bwd: 40243.26 | bwd_inner: 39564.50 | bwd_allreduce: 678.29 | step: 183.86 33%|███▎ | 222/671 [4:20:31<8:39:51, 69.47s/it] {'loss': 1.2149, 'learning_rate': 1.5640804655895086e-05, 'epoch': 0.33} 33%|███▎ | 222/671 [4:20:31<8:39:51, 69.47s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2071 [2024-07-29 16:03:54,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.94 | bwd_microstep: 5364.90 | bwd_inner_microstep: 4951.28 | bwd_allreduce_microstep: 413.56 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3773 [2024-07-29 16:04:03,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.19 | bwd_microstep: 5111.78 | bwd_inner_microstep: 5066.88 | bwd_allreduce_microstep: 44.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-29 16:04:12,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.76 | bwd_microstep: 5084.49 | bwd_inner_microstep: 5013.28 | bwd_allreduce_microstep: 71.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3784 [2024-07-29 16:04:20,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.68 | bwd_microstep: 5157.52 | bwd_inner_microstep: 5110.12 | bwd_allreduce_microstep: 47.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-29 16:04:29,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.46 | bwd_microstep: 5196.50 | bwd_inner_microstep: 4794.32 | bwd_allreduce_microstep: 402.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-29 16:04:38,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.91 | bwd_microstep: 5025.01 | bwd_inner_microstep: 4987.63 | bwd_allreduce_microstep: 37.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 16:04:46,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.40 | bwd_microstep: 5101.63 | bwd_inner_microstep: 5039.78 | bwd_allreduce_microstep: 61.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-29 16:04:55,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 16:04:55,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.78 | bwd_microstep: 4918.80 | bwd_inner_microstep: 4894.54 | bwd_allreduce_microstep: 24.19 | step_microstep: 181.88 [2024-07-29 16:04:55,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28812.02 | bwd: 40960.61 | bwd_inner: 39857.75 | bwd_allreduce: 1102.39 | step: 182.48 33%|███▎ | 223/671 [4:21:41<8:40:07, 69.66s/it] {'loss': 1.144, 'learning_rate': 1.5600830075479604e-05, 'epoch': 0.33} 33%|███▎ | 223/671 [4:21:41<8:40:07, 69.66s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3951 [2024-07-29 16:05:04,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.75 | bwd_microstep: 5210.67 | bwd_inner_microstep: 5174.75 | bwd_allreduce_microstep: 35.86 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2289 [2024-07-29 16:05:13,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.87 | bwd_microstep: 5231.17 | bwd_inner_microstep: 4823.69 | bwd_allreduce_microstep: 407.40 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2091 [2024-07-29 16:05:22,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.43 | bwd_microstep: 5217.72 | bwd_inner_microstep: 4811.67 | bwd_allreduce_microstep: 405.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-07-29 16:05:30,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.03 | bwd_microstep: 4989.44 | bwd_inner_microstep: 4970.08 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2139 [2024-07-29 16:05:39,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.81 | bwd_microstep: 5249.31 | bwd_inner_microstep: 4842.46 | bwd_allreduce_microstep: 406.79 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-29 16:05:48,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.01 | bwd_microstep: 5192.50 | bwd_inner_microstep: 4787.53 | bwd_allreduce_microstep: 404.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-29 16:05:57,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.80 | bwd_microstep: 5043.89 | bwd_inner_microstep: 4986.76 | bwd_allreduce_microstep: 57.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-29 16:06:05,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-29 16:06:05,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.14 | bwd_microstep: 4988.91 | bwd_inner_microstep: 4940.88 | bwd_allreduce_microstep: 47.96 | step_microstep: 180.98 [2024-07-29 16:06:05,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28689.74 | bwd: 41123.60 | bwd_inner: 39337.76 | bwd_allreduce: 1785.36 | step: 181.57 33%|███▎ | 224/671 [4:22:51<8:40:02, 69.80s/it] {'loss': 1.1205, 'learning_rate': 1.5560724659869905e-05, 'epoch': 0.33} 33%|███▎ | 224/671 [4:22:51<8:40:02, 69.80s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3898 [2024-07-29 16:06:14,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.49 | bwd_microstep: 5152.88 | bwd_inner_microstep: 5133.75 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2316 [2024-07-29 16:06:23,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.02 | bwd_microstep: 5339.36 | bwd_inner_microstep: 4923.88 | bwd_allreduce_microstep: 415.42 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2250 [2024-07-29 16:06:31,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3075.72 | bwd_microstep: 5025.65 | bwd_inner_microstep: 4638.85 | bwd_allreduce_microstep: 386.74 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3735 [2024-07-29 16:06:40,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.62 | bwd_microstep: 5135.61 | bwd_inner_microstep: 5072.55 | bwd_allreduce_microstep: 63.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3785 [2024-07-29 16:06:49,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.64 | bwd_microstep: 5095.34 | bwd_inner_microstep: 5051.33 | bwd_allreduce_microstep: 43.94 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2108 [2024-07-29 16:06:57,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2992.27 | bwd_microstep: 4851.80 | bwd_inner_microstep: 4478.71 | bwd_allreduce_microstep: 373.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-29 16:07:06,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.57 | bwd_microstep: 4994.29 | bwd_inner_microstep: 4974.92 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2154 [2024-07-29 16:07:14,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 16:07:14,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.55 | bwd_microstep: 5127.81 | bwd_inner_microstep: 4728.39 | bwd_allreduce_microstep: 399.36 | step_microstep: 180.87 [2024-07-29 16:07:14,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27925.79 | bwd: 40722.74 | bwd_inner: 39002.31 | bwd_allreduce: 1719.95 | step: 181.46 34%|███▎ | 225/671 [4:24:00<8:37:01, 69.55s/it] {'loss': 1.2103, 'learning_rate': 1.5520489345927095e-05, 'epoch': 0.33} 34%|███▎ | 225/671 [4:24:00<8:37:01, 69.55s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3894 [2024-07-29 16:07:23,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.21 | bwd_microstep: 5141.82 | bwd_inner_microstep: 5122.71 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2326 [2024-07-29 16:07:32,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.41 | bwd_microstep: 5293.40 | bwd_inner_microstep: 4883.50 | bwd_allreduce_microstep: 409.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-29 16:07:41,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.91 | bwd_microstep: 5206.07 | bwd_inner_microstep: 4799.23 | bwd_allreduce_microstep: 406.78 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3754 [2024-07-29 16:07:50,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.84 | bwd_microstep: 5113.33 | bwd_inner_microstep: 5049.42 | bwd_allreduce_microstep: 63.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3739 [2024-07-29 16:07:58,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.10 | bwd_microstep: 5099.41 | bwd_inner_microstep: 5020.99 | bwd_allreduce_microstep: 78.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-29 16:08:07,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.11 | bwd_microstep: 4979.26 | bwd_inner_microstep: 4938.50 | bwd_allreduce_microstep: 40.69 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-29 16:08:15,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.17 | bwd_microstep: 4777.25 | bwd_inner_microstep: 4744.60 | bwd_allreduce_microstep: 32.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-29 16:08:24,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-29 16:08:24,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.29 | bwd_microstep: 5122.38 | bwd_inner_microstep: 4724.88 | bwd_allreduce_microstep: 397.43 | step_microstep: 181.33 [2024-07-29 16:08:24,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28348.94 | bwd: 40732.90 | bwd_inner: 39283.76 | bwd_allreduce: 1448.66 | step: 181.91 34%|███▎ | 226/671 [4:25:10<8:35:31, 69.51s/it] {'loss': 1.1597, 'learning_rate': 1.5480125073546705e-05, 'epoch': 0.34} 34%|███▎ | 226/671 [4:25:10<8:35:31, 69.51s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2449 [2024-07-29 16:08:33,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.48 | bwd_microstep: 5773.54 | bwd_inner_microstep: 5344.27 | bwd_allreduce_microstep: 429.20 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3054 [2024-07-29 16:08:42,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.33 | bwd_microstep: 5164.82 | bwd_inner_microstep: 4860.92 | bwd_allreduce_microstep: 303.84 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3797 [2024-07-29 16:08:51,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.48 | bwd_microstep: 5189.87 | bwd_inner_microstep: 5113.08 | bwd_allreduce_microstep: 76.71 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2221 [2024-07-29 16:08:59,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3061.37 | bwd_microstep: 5037.08 | bwd_inner_microstep: 4649.86 | bwd_allreduce_microstep: 387.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3765 [2024-07-29 16:09:08,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.35 | bwd_microstep: 5176.30 | bwd_inner_microstep: 5119.76 | bwd_allreduce_microstep: 56.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-29 16:09:17,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.60 | bwd_microstep: 5032.01 | bwd_inner_microstep: 4992.21 | bwd_allreduce_microstep: 39.73 | step_microstep: 0.18 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2138 [2024-07-29 16:09:25,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.49 | bwd_microstep: 5107.17 | bwd_inner_microstep: 4710.86 | bwd_allreduce_microstep: 396.24 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-29 16:09:34,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 16:09:34,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3494.41 | bwd_microstep: 5072.04 | bwd_inner_microstep: 4677.67 | bwd_allreduce_microstep: 394.31 | step_microstep: 181.58 [2024-07-29 16:09:34,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28248.42 | bwd: 41552.81 | bwd_inner: 39468.57 | bwd_allreduce: 2083.76 | step: 182.26 34%|███▍ | 227/671 [4:26:20<8:35:44, 69.70s/it] {'loss': 1.217, 'learning_rate': 1.5439632785636707e-05, 'epoch': 0.34} 34%|███▍ | 227/671 [4:26:20<8:35:44, 69.70s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-29 16:09:43,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3342.70 | bwd_microstep: 5544.29 | bwd_inner_microstep: 5482.18 | bwd_allreduce_microstep: 62.05 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3830 [2024-07-29 16:09:52,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.76 | bwd_microstep: 5202.91 | bwd_inner_microstep: 5149.63 | bwd_allreduce_microstep: 53.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-29 16:10:00,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.86 | bwd_microstep: 5019.74 | bwd_inner_microstep: 4994.85 | bwd_allreduce_microstep: 24.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-29 16:10:09,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.52 | bwd_microstep: 5007.94 | bwd_inner_microstep: 4988.47 | bwd_allreduce_microstep: 19.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-29 16:10:17,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.84 | bwd_microstep: 4856.04 | bwd_inner_microstep: 4814.71 | bwd_allreduce_microstep: 41.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-29 16:10:26,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.99 | bwd_microstep: 5143.23 | bwd_inner_microstep: 5075.30 | bwd_allreduce_microstep: 67.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-29 16:10:35,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.70 | bwd_microstep: 5074.59 | bwd_inner_microstep: 5015.96 | bwd_allreduce_microstep: 58.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-29 16:10:44,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 16:10:44,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.42 | bwd_microstep: 4997.00 | bwd_inner_microstep: 4949.42 | bwd_allreduce_microstep: 47.51 | step_microstep: 180.72 [2024-07-29 16:10:44,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28454.69 | bwd: 40845.72 | bwd_inner: 40470.45 | bwd_allreduce: 374.81 | step: 181.30 34%|███▍ | 228/671 [4:27:30<8:34:26, 69.68s/it] {'loss': 1.148, 'learning_rate': 1.539901342809554e-05, 'epoch': 0.34} 34%|███▍ | 228/671 [4:27:30<8:34:26, 69.68s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3953 [2024-07-29 16:10:53,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.54 | bwd_microstep: 5317.26 | bwd_inner_microstep: 5262.96 | bwd_allreduce_microstep: 54.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-29 16:11:01,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.95 | bwd_microstep: 5070.51 | bwd_inner_microstep: 5039.67 | bwd_allreduce_microstep: 30.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-29 16:11:10,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.27 | bwd_microstep: 5188.29 | bwd_inner_microstep: 4787.21 | bwd_allreduce_microstep: 401.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3791 [2024-07-29 16:11:19,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.73 | bwd_microstep: 5022.62 | bwd_inner_microstep: 4988.01 | bwd_allreduce_microstep: 34.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3628 [2024-07-29 16:11:27,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.60 | bwd_microstep: 5091.05 | bwd_inner_microstep: 5008.98 | bwd_allreduce_microstep: 82.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-29 16:11:36,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.36 | bwd_microstep: 5080.83 | bwd_inner_microstep: 5036.05 | bwd_allreduce_microstep: 44.71 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-29 16:11:45,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.35 | bwd_microstep: 5101.47 | bwd_inner_microstep: 4706.04 | bwd_allreduce_microstep: 395.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-29 16:11:54,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 16:11:54,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.27 | bwd_microstep: 5091.32 | bwd_inner_microstep: 4695.05 | bwd_allreduce_microstep: 396.21 | step_microstep: 181.15 [2024-07-29 16:11:54,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28757.96 | bwd: 40963.34 | bwd_inner: 39523.91 | bwd_allreduce: 1438.95 | step: 181.74 34%|███▍ | 229/671 [4:28:40<8:34:06, 69.79s/it] {'loss': 1.2158, 'learning_rate': 1.5358267949789968e-05, 'epoch': 0.34} 34%|███▍ | 229/671 [4:28:40<8:34:06, 69.79s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2390 [2024-07-29 16:12:02,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.02 | bwd_microstep: 5286.53 | bwd_inner_microstep: 4882.91 | bwd_allreduce_microstep: 403.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3805 [2024-07-29 16:12:11,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.07 | bwd_microstep: 5036.32 | bwd_inner_microstep: 5017.00 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3768 [2024-07-29 16:12:20,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3421.09 | bwd_microstep: 4912.86 | bwd_inner_microstep: 4877.15 | bwd_allreduce_microstep: 35.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3653 [2024-07-29 16:12:28,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.75 | bwd_microstep: 5159.06 | bwd_inner_microstep: 5062.34 | bwd_allreduce_microstep: 96.66 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-29 16:12:37,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.72 | bwd_microstep: 5154.77 | bwd_inner_microstep: 5077.47 | bwd_allreduce_microstep: 77.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-29 16:12:45,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3200.92 | bwd_microstep: 4788.98 | bwd_inner_microstep: 4752.09 | bwd_allreduce_microstep: 36.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3736 [2024-07-29 16:12:54,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.14 | bwd_microstep: 5035.25 | bwd_inner_microstep: 4994.45 | bwd_allreduce_microstep: 40.74 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-29 16:13:03,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-29 16:13:03,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.74 | bwd_microstep: 4903.15 | bwd_inner_microstep: 4883.78 | bwd_allreduce_microstep: 19.29 | step_microstep: 368.39 [2024-07-29 16:13:03,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28452.37 | bwd: 40276.91 | bwd_inner: 39547.13 | bwd_allreduce: 729.30 | step: 369.08 34%|███▍ | 230/671 [4:29:49<8:31:45, 69.63s/it] {'loss': 1.2159, 'learning_rate': 1.5317397302532933e-05, 'epoch': 0.34} 34%|███▍ | 230/671 [4:29:49<8:31:45, 69.63s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3913 [2024-07-29 16:13:12,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3792.48 | bwd_microstep: 5169.11 | bwd_inner_microstep: 5149.96 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2311 [2024-07-29 16:13:21,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.58 | bwd_microstep: 5279.61 | bwd_inner_microstep: 4869.16 | bwd_allreduce_microstep: 410.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-29 16:13:29,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.38 | bwd_microstep: 5113.32 | bwd_inner_microstep: 5040.74 | bwd_allreduce_microstep: 72.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-29 16:13:37,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3212.85 | bwd_microstep: 4827.58 | bwd_inner_microstep: 4786.03 | bwd_allreduce_microstep: 41.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-29 16:13:46,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.67 | bwd_microstep: 4962.34 | bwd_inner_microstep: 4929.62 | bwd_allreduce_microstep: 32.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-29 16:13:55,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.77 | bwd_microstep: 5172.76 | bwd_inner_microstep: 5100.54 | bwd_allreduce_microstep: 72.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-29 16:14:03,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.68 | bwd_microstep: 5031.95 | bwd_inner_microstep: 4640.93 | bwd_allreduce_microstep: 390.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-29 16:14:12,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-29 16:14:12,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.64 | bwd_microstep: 4889.06 | bwd_inner_microstep: 4869.67 | bwd_allreduce_microstep: 19.32 | step_microstep: 181.22 [2024-07-29 16:14:12,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28499.94 | bwd: 40445.71 | bwd_inner: 39386.59 | bwd_allreduce: 1058.65 | step: 181.79 34%|███▍ | 231/671 [4:30:58<8:29:50, 69.52s/it] {'loss': 1.1752, 'learning_rate': 1.527640244106133e-05, 'epoch': 0.34} 34%|███▍ | 231/671 [4:30:58<8:29:50, 69.52s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-29 16:14:21,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.05 | bwd_microstep: 5321.09 | bwd_inner_microstep: 5226.88 | bwd_allreduce_microstep: 94.14 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-29 16:14:30,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3778.23 | bwd_microstep: 5051.46 | bwd_inner_microstep: 5021.05 | bwd_allreduce_microstep: 30.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3781 [2024-07-29 16:14:39,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.41 | bwd_microstep: 5179.16 | bwd_inner_microstep: 5127.44 | bwd_allreduce_microstep: 51.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-07-29 16:14:48,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.12 | bwd_microstep: 5208.37 | bwd_inner_microstep: 4803.65 | bwd_allreduce_microstep: 404.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-07-29 16:14:56,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.27 | bwd_microstep: 5159.70 | bwd_inner_microstep: 4758.24 | bwd_allreduce_microstep: 401.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-29 16:15:05,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.97 | bwd_microstep: 5035.43 | bwd_inner_microstep: 4970.64 | bwd_allreduce_microstep: 64.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3661 [2024-07-29 16:15:14,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.89 | bwd_microstep: 4906.71 | bwd_inner_microstep: 4881.90 | bwd_allreduce_microstep: 24.74 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-29 16:15:22,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-29 16:15:22,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3198.62 | bwd_microstep: 4711.28 | bwd_inner_microstep: 4686.94 | bwd_allreduce_microstep: 24.28 | step_microstep: 181.28 [2024-07-29 16:15:22,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28573.46 | bwd: 40573.18 | bwd_inner: 39476.66 | bwd_allreduce: 1096.05 | step: 181.90 35%|███▍ | 232/671 [4:32:08<8:28:35, 69.51s/it] {'loss': 1.2458, 'learning_rate': 1.5235284323013674e-05, 'epoch': 0.35} 35%|███▍ | 232/671 [4:32:08<8:28:35, 69.51s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3938 [2024-07-29 16:15:31,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.52 | bwd_microstep: 5323.10 | bwd_inner_microstep: 5266.01 | bwd_allreduce_microstep: 57.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-29 16:15:39,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.36 | bwd_microstep: 5144.43 | bwd_inner_microstep: 5070.33 | bwd_allreduce_microstep: 74.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-29 16:15:48,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.87 | bwd_microstep: 5169.64 | bwd_inner_microstep: 5090.36 | bwd_allreduce_microstep: 79.21 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2241 [2024-07-29 16:15:57,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.97 | bwd_microstep: 5150.20 | bwd_inner_microstep: 4750.61 | bwd_allreduce_microstep: 399.52 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-29 16:16:06,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.81 | bwd_microstep: 5155.82 | bwd_inner_microstep: 4753.88 | bwd_allreduce_microstep: 401.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-29 16:16:14,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.58 | bwd_microstep: 5047.79 | bwd_inner_microstep: 4983.10 | bwd_allreduce_microstep: 64.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-29 16:16:23,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.95 | bwd_microstep: 4999.94 | bwd_inner_microstep: 4949.87 | bwd_allreduce_microstep: 50.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-29 16:16:32,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-29 16:16:32,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.47 | bwd_microstep: 4978.64 | bwd_inner_microstep: 4924.08 | bwd_allreduce_microstep: 54.49 | step_microstep: 180.91 [2024-07-29 16:16:32,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28608.44 | bwd: 40969.54 | bwd_inner: 39788.17 | bwd_allreduce: 1180.91 | step: 181.58 35%|███▍ | 233/671 [4:33:17<8:28:18, 69.63s/it] {'loss': 1.1763, 'learning_rate': 1.5194043908907774e-05, 'epoch': 0.35} 35%|███▍ | 233/671 [4:33:17<8:28:18, 69.63s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2114 [2024-07-29 16:16:40,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.76 | bwd_microstep: 5326.28 | bwd_inner_microstep: 4915.06 | bwd_allreduce_microstep: 411.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3772 [2024-07-29 16:16:49,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3798.34 | bwd_microstep: 5069.68 | bwd_inner_microstep: 5041.21 | bwd_allreduce_microstep: 28.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3753 [2024-07-29 16:16:58,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.03 | bwd_microstep: 4981.80 | bwd_inner_microstep: 4962.35 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-29 16:17:06,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3202.29 | bwd_microstep: 4827.74 | bwd_inner_microstep: 4788.03 | bwd_allreduce_microstep: 39.65 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1698 [2024-07-29 16:17:15,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.85 | bwd_microstep: 5207.28 | bwd_inner_microstep: 4804.79 | bwd_allreduce_microstep: 402.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-29 16:17:24,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.60 | bwd_microstep: 5245.47 | bwd_inner_microstep: 4839.67 | bwd_allreduce_microstep: 405.74 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2153 [2024-07-29 16:17:32,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.04 | bwd_microstep: 5101.35 | bwd_inner_microstep: 4706.76 | bwd_allreduce_microstep: 394.52 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3690 [2024-07-29 16:17:41,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 16:17:41,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3840.53 | bwd_microstep: 5068.15 | bwd_inner_microstep: 4992.42 | bwd_allreduce_microstep: 75.66 | step_microstep: 180.94 [2024-07-29 16:17:41,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28702.36 | bwd: 40827.74 | bwd_inner: 39050.23 | bwd_allreduce: 1777.04 | step: 181.51 35%|███▍ | 234/671 [4:34:27<8:27:37, 69.70s/it] {'loss': 1.1833, 'learning_rate': 1.515268216211825e-05, 'epoch': 0.35} 35%|███▍ | 234/671 [4:34:27<8:27:37, 69.70s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3938 [2024-07-29 16:17:50,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3183.70 | bwd_microstep: 5068.06 | bwd_inner_microstep: 5024.73 | bwd_allreduce_microstep: 43.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3807 [2024-07-29 16:17:59,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3783.77 | bwd_microstep: 5095.75 | bwd_inner_microstep: 5069.23 | bwd_allreduce_microstep: 26.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3780 [2024-07-29 16:18:07,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.58 | bwd_microstep: 5024.41 | bwd_inner_microstep: 5005.07 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3641 [2024-07-29 16:18:16,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.15 | bwd_microstep: 5073.09 | bwd_inner_microstep: 5003.17 | bwd_allreduce_microstep: 69.85 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2112 [2024-07-29 16:18:25,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.58 | bwd_microstep: 5264.31 | bwd_inner_microstep: 4855.35 | bwd_allreduce_microstep: 408.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-29 16:18:33,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.48 | bwd_microstep: 4966.24 | bwd_inner_microstep: 4922.01 | bwd_allreduce_microstep: 44.17 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2212 [2024-07-29 16:18:42,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.64 | bwd_microstep: 5129.65 | bwd_inner_microstep: 4730.09 | bwd_allreduce_microstep: 399.49 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2130 [2024-07-29 16:18:51,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 16:18:51,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.52 | bwd_microstep: 5092.57 | bwd_inner_microstep: 4697.45 | bwd_allreduce_microstep: 395.06 | step_microstep: 182.69 [2024-07-29 16:18:51,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28359.33 | bwd: 40714.06 | bwd_inner: 39307.04 | bwd_allreduce: 1406.55 | step: 183.29 35%|███▌ | 235/671 [4:35:37<8:25:49, 69.61s/it] {'loss': 1.1247, 'learning_rate': 1.5111200048854055e-05, 'epoch': 0.35} 35%|███▌ | 235/671 [4:35:37<8:25:49, 69.61s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3957 [2024-07-29 16:19:00,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.70 | bwd_microstep: 5331.87 | bwd_inner_microstep: 5274.91 | bwd_allreduce_microstep: 56.90 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2225 [2024-07-29 16:19:09,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.89 | bwd_microstep: 5253.67 | bwd_inner_microstep: 4845.63 | bwd_allreduce_microstep: 407.97 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-29 16:19:17,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.88 | bwd_microstep: 5177.05 | bwd_inner_microstep: 5092.13 | bwd_allreduce_microstep: 84.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-29 16:19:26,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.17 | bwd_microstep: 5213.60 | bwd_inner_microstep: 4807.82 | bwd_allreduce_microstep: 405.72 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-29 16:19:35,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.70 | bwd_microstep: 5176.74 | bwd_inner_microstep: 5120.21 | bwd_allreduce_microstep: 56.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-29 16:19:44,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.62 | bwd_microstep: 5089.57 | bwd_inner_microstep: 4695.58 | bwd_allreduce_microstep: 393.92 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 16:19:52,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.70 | bwd_microstep: 4997.90 | bwd_inner_microstep: 4943.69 | bwd_allreduce_microstep: 54.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2137 [2024-07-29 16:20:01,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 16:20:01,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.36 | bwd_microstep: 5118.73 | bwd_inner_microstep: 4722.64 | bwd_allreduce_microstep: 396.02 | step_microstep: 368.03 [2024-07-29 16:20:01,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28627.93 | bwd: 41359.10 | bwd_inner: 39502.55 | bwd_allreduce: 1856.08 | step: 368.71 35%|███▌ | 236/671 [4:36:47<8:26:36, 69.88s/it] {'loss': 1.1844, 'learning_rate': 1.5069598538135905e-05, 'epoch': 0.35} 35%|███▌ | 236/671 [4:36:47<8:26:36, 69.88s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3917 [2024-07-29 16:20:10,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3321.87 | bwd_microstep: 4961.73 | bwd_inner_microstep: 4942.57 | bwd_allreduce_microstep: 19.09 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2291 [2024-07-29 16:20:18,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.45 | bwd_microstep: 5298.88 | bwd_inner_microstep: 4886.77 | bwd_allreduce_microstep: 412.05 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2088 [2024-07-29 16:20:27,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3063.86 | bwd_microstep: 5089.16 | bwd_inner_microstep: 4696.60 | bwd_allreduce_microstep: 392.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 16:20:35,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.73 | bwd_microstep: 5045.37 | bwd_inner_microstep: 5016.97 | bwd_allreduce_microstep: 28.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-29 16:20:44,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.42 | bwd_microstep: 5188.55 | bwd_inner_microstep: 5134.92 | bwd_allreduce_microstep: 53.57 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-29 16:20:53,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.54 | bwd_microstep: 5233.86 | bwd_inner_microstep: 4826.65 | bwd_allreduce_microstep: 407.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-29 16:21:02,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.18 | bwd_microstep: 5051.92 | bwd_inner_microstep: 4988.13 | bwd_allreduce_microstep: 63.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-29 16:21:11,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.74 [2024-07-29 16:21:11,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.79 | bwd_microstep: 5011.95 | bwd_inner_microstep: 4960.17 | bwd_allreduce_microstep: 51.72 | step_microstep: 181.08 [2024-07-29 16:21:11,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28012.76 | bwd: 40881.40 | bwd_inner: 39452.69 | bwd_allreduce: 1428.23 | step: 181.67 35%|███▌ | 237/671 [4:37:56<8:24:01, 69.68s/it] {'loss': 1.234, 'learning_rate': 1.5027878601773633e-05, 'epoch': 0.35} 35%|███▌ | 237/671 [4:37:56<8:24:01, 69.68s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3915 [2024-07-29 16:21:19,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3302.94 | bwd_microstep: 4979.46 | bwd_inner_microstep: 4960.37 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2251 [2024-07-29 16:21:27,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3487.86 | bwd_microstep: 5124.98 | bwd_inner_microstep: 4727.52 | bwd_allreduce_microstep: 397.40 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2082 [2024-07-29 16:21:36,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.11 | bwd_microstep: 5256.83 | bwd_inner_microstep: 4848.93 | bwd_allreduce_microstep: 407.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-29 16:21:45,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.19 | bwd_microstep: 5102.01 | bwd_inner_microstep: 5055.88 | bwd_allreduce_microstep: 46.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3728 [2024-07-29 16:21:54,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.14 | bwd_microstep: 5147.04 | bwd_inner_microstep: 5061.72 | bwd_allreduce_microstep: 85.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-29 16:22:03,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.14 | bwd_microstep: 4997.71 | bwd_inner_microstep: 4978.24 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-29 16:22:11,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3022.72 | bwd_microstep: 4928.05 | bwd_inner_microstep: 4549.33 | bwd_allreduce_microstep: 378.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-29 16:22:19,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 16:22:19,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.04 | bwd_microstep: 5083.99 | bwd_inner_microstep: 4690.86 | bwd_allreduce_microstep: 393.07 | step_microstep: 183.20 [2024-07-29 16:22:19,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27848.03 | bwd: 40620.02 | bwd_inner: 38872.78 | bwd_allreduce: 1746.75 | step: 183.79 35%|███▌ | 238/671 [4:39:05<8:20:57, 69.42s/it] {'loss': 1.1977, 'learning_rate': 1.4986041214343487e-05, 'epoch': 0.35} 35%|███▌ | 238/671 [4:39:05<8:20:57, 69.42s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2345 [2024-07-29 16:22:28,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.31 | bwd_microstep: 5373.22 | bwd_inner_microstep: 4959.24 | bwd_allreduce_microstep: 413.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-07-29 16:22:37,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.53 | bwd_microstep: 5157.14 | bwd_inner_microstep: 5073.08 | bwd_allreduce_microstep: 83.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-29 16:22:46,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.04 | bwd_microstep: 5175.53 | bwd_inner_microstep: 5095.85 | bwd_allreduce_microstep: 79.61 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2239 [2024-07-29 16:22:55,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.28 | bwd_microstep: 5202.74 | bwd_inner_microstep: 4797.65 | bwd_allreduce_microstep: 405.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-29 16:23:03,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.47 | bwd_microstep: 5110.07 | bwd_inner_microstep: 5044.85 | bwd_allreduce_microstep: 65.14 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-29 16:23:12,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.74 | bwd_microstep: 4954.71 | bwd_inner_microstep: 4924.47 | bwd_allreduce_microstep: 30.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-29 16:23:21,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.73 | bwd_microstep: 5103.93 | bwd_inner_microstep: 4708.31 | bwd_allreduce_microstep: 395.56 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-29 16:23:29,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-29 16:23:29,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3014.85 | bwd_microstep: 4883.78 | bwd_inner_microstep: 4505.31 | bwd_allreduce_microstep: 378.41 | step_microstep: 180.95 [2024-07-29 16:23:29,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28001.86 | bwd: 40961.11 | bwd_inner: 39108.70 | bwd_allreduce: 1851.94 | step: 181.54 36%|███▌ | 239/671 [4:40:15<8:19:31, 69.38s/it] {'loss': 1.1907, 'learning_rate': 1.494408735316537e-05, 'epoch': 0.36} 36%|███▌ | 239/671 [4:40:15<8:19:31, 69.38s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2401 [2024-07-29 16:23:37,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.84 | bwd_microstep: 5274.87 | bwd_inner_microstep: 4867.17 | bwd_allreduce_microstep: 407.63 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-29 16:23:47,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3910.59 | bwd_microstep: 5174.97 | bwd_inner_microstep: 4769.58 | bwd_allreduce_microstep: 405.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-29 16:23:55,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.29 | bwd_microstep: 5153.87 | bwd_inner_microstep: 4755.31 | bwd_allreduce_microstep: 398.50 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3644 [2024-07-29 16:24:04,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.67 | bwd_microstep: 5108.30 | bwd_inner_microstep: 5017.85 | bwd_allreduce_microstep: 90.38 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2194 [2024-07-29 16:24:13,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.43 | bwd_microstep: 5025.00 | bwd_inner_microstep: 4632.66 | bwd_allreduce_microstep: 392.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-29 16:24:21,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.29 | bwd_microstep: 5183.25 | bwd_inner_microstep: 4780.39 | bwd_allreduce_microstep: 402.79 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-29 16:24:30,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.00 | bwd_microstep: 5045.86 | bwd_inner_microstep: 4985.13 | bwd_allreduce_microstep: 60.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-29 16:24:39,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 16:24:39,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.54 | bwd_microstep: 5108.98 | bwd_inner_microstep: 4712.94 | bwd_allreduce_microstep: 395.98 | step_microstep: 181.21 [2024-07-29 16:24:39,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28837.56 | bwd: 41075.08 | bwd_inner: 38520.98 | bwd_allreduce: 2553.64 | step: 181.90 36%|███▌ | 240/671 [4:41:25<8:20:14, 69.64s/it] {'loss': 1.1919, 'learning_rate': 1.490201799828001e-05, 'epoch': 0.36} 36%|███▌ | 240/671 [4:41:25<8:20:14, 69.64s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4007 [2024-07-29 16:24:48,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3850.71 | bwd_microstep: 5237.92 | bwd_inner_microstep: 5218.78 | bwd_allreduce_microstep: 19.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3780 [2024-07-29 16:24:57,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3778.26 | bwd_microstep: 5195.98 | bwd_inner_microstep: 5157.36 | bwd_allreduce_microstep: 38.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3769 [2024-07-29 16:25:06,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.67 | bwd_microstep: 5126.81 | bwd_inner_microstep: 5076.39 | bwd_allreduce_microstep: 50.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-29 16:25:14,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.07 | bwd_microstep: 5083.72 | bwd_inner_microstep: 5041.67 | bwd_allreduce_microstep: 41.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-29 16:25:23,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.74 | bwd_microstep: 5115.90 | bwd_inner_microstep: 5049.42 | bwd_allreduce_microstep: 66.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-29 16:25:32,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.23 | bwd_microstep: 5125.41 | bwd_inner_microstep: 5062.77 | bwd_allreduce_microstep: 62.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2153 [2024-07-29 16:25:40,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.32 | bwd_microstep: 5066.24 | bwd_inner_microstep: 4672.63 | bwd_allreduce_microstep: 393.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-29 16:25:49,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 16:25:49,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.73 | bwd_microstep: 4916.46 | bwd_inner_microstep: 4891.92 | bwd_allreduce_microstep: 24.48 | step_microstep: 183.50 [2024-07-29 16:25:49,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29169.65 | bwd: 40868.42 | bwd_inner: 40170.88 | bwd_allreduce: 697.07 | step: 184.08 36%|███▌ | 241/671 [4:42:35<8:20:39, 69.86s/it] {'loss': 1.2155, 'learning_rate': 1.485983413242606e-05, 'epoch': 0.36} 36%|███▌ | 241/671 [4:42:35<8:20:39, 69.86s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2358 [2024-07-29 16:25:57,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3092.40 | bwd_microstep: 5076.44 | bwd_inner_microstep: 4690.58 | bwd_allreduce_microstep: 385.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3811 [2024-07-29 16:26:06,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.48 | bwd_microstep: 5181.25 | bwd_inner_microstep: 5127.82 | bwd_allreduce_microstep: 53.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2230 [2024-07-29 16:26:15,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.22 | bwd_microstep: 5258.51 | bwd_inner_microstep: 4849.44 | bwd_allreduce_microstep: 409.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3748 [2024-07-29 16:26:24,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.30 | bwd_microstep: 4997.47 | bwd_inner_microstep: 4978.15 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2180 [2024-07-29 16:26:33,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.03 | bwd_microstep: 5150.59 | bwd_inner_microstep: 4750.20 | bwd_allreduce_microstep: 400.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-29 16:26:40,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3199.19 | bwd_microstep: 4720.66 | bwd_inner_microstep: 4695.53 | bwd_allreduce_microstep: 25.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-29 16:26:49,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.54 | bwd_microstep: 5094.72 | bwd_inner_microstep: 5029.77 | bwd_allreduce_microstep: 64.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-29 16:26:58,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-29 16:26:58,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.67 | bwd_microstep: 4892.48 | bwd_inner_microstep: 4873.13 | bwd_allreduce_microstep: 19.28 | step_microstep: 182.11 [2024-07-29 16:26:58,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28019.75 | bwd: 40372.10 | bwd_inner: 38994.56 | bwd_allreduce: 1377.06 | step: 182.69 36%|███▌ | 242/671 [4:43:44<8:17:02, 69.52s/it] {'loss': 1.1924, 'learning_rate': 1.4817536741017153e-05, 'epoch': 0.36} 36%|███▌ | 242/671 [4:43:44<8:17:02, 69.52s/it]dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1404 [2024-07-29 16:27:07,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.28 | bwd_microstep: 5487.31 | bwd_inner_microstep: 5065.88 | bwd_allreduce_microstep: 421.36 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3782 [2024-07-29 16:27:16,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.71 | bwd_microstep: 5072.73 | bwd_inner_microstep: 5048.00 | bwd_allreduce_microstep: 24.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-29 16:27:25,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.31 | bwd_microstep: 5158.25 | bwd_inner_microstep: 5077.95 | bwd_allreduce_microstep: 80.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3773 [2024-07-29 16:27:33,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.44 | bwd_microstep: 5014.87 | bwd_inner_microstep: 4982.72 | bwd_allreduce_microstep: 32.09 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2252 [2024-07-29 16:27:42,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.46 | bwd_microstep: 5201.00 | bwd_inner_microstep: 4795.85 | bwd_allreduce_microstep: 405.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3662 [2024-07-29 16:27:51,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.00 | bwd_microstep: 5169.57 | bwd_inner_microstep: 5080.30 | bwd_allreduce_microstep: 89.20 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2146 [2024-07-29 16:28:00,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.32 | bwd_microstep: 5122.89 | bwd_inner_microstep: 4725.07 | bwd_allreduce_microstep: 397.74 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 16:28:08,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 16:28:08,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.45 | bwd_microstep: 5034.22 | bwd_inner_microstep: 4973.47 | bwd_allreduce_microstep: 60.69 | step_microstep: 181.18 [2024-07-29 16:28:08,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28820.88 | bwd: 41260.83 | bwd_inner: 39749.17 | bwd_allreduce: 1511.18 | step: 181.88 36%|███▌ | 243/671 [4:44:54<8:17:47, 69.78s/it] {'loss': 1.1404, 'learning_rate': 1.4775126812118865e-05, 'epoch': 0.36} 36%|███▌ | 243/671 [4:44:54<8:17:47, 69.78s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3912 [2024-07-29 16:28:17,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3824.49 | bwd_microstep: 5149.97 | bwd_inner_microstep: 5130.76 | bwd_allreduce_microstep: 19.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-29 16:28:26,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.39 | bwd_microstep: 5137.30 | bwd_inner_microstep: 5066.63 | bwd_allreduce_microstep: 70.61 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3619 [2024-07-29 16:28:35,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.06 | bwd_microstep: 5125.88 | bwd_inner_microstep: 5029.01 | bwd_allreduce_microstep: 96.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-29 16:28:43,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.14 | bwd_microstep: 5153.40 | bwd_inner_microstep: 4751.72 | bwd_allreduce_microstep: 401.61 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-29 16:28:52,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.52 | bwd_microstep: 5253.17 | bwd_inner_microstep: 4847.02 | bwd_allreduce_microstep: 406.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3772 [2024-07-29 16:29:01,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.26 | bwd_microstep: 5023.71 | bwd_inner_microstep: 4986.74 | bwd_allreduce_microstep: 36.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-29 16:29:10,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.35 | bwd_microstep: 5117.64 | bwd_inner_microstep: 5049.22 | bwd_allreduce_microstep: 68.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-29 16:29:18,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 16:29:18,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.58 | bwd_microstep: 4947.59 | bwd_inner_microstep: 4903.63 | bwd_allreduce_microstep: 43.89 | step_microstep: 180.83 [2024-07-29 16:29:18,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28820.70 | bwd: 40908.64 | bwd_inner: 39764.69 | bwd_allreduce: 1143.48 | step: 181.56 36%|███▋ | 244/671 [4:46:04<8:17:13, 69.87s/it] {'loss': 1.1864, 'learning_rate': 1.473260533642565e-05, 'epoch': 0.36} 36%|███▋ | 244/671 [4:46:04<8:17:13, 69.87s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3919 [2024-07-29 16:29:28,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.13 | bwd_microstep: 5398.21 | bwd_inner_microstep: 5332.02 | bwd_allreduce_microstep: 66.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3836 [2024-07-29 16:29:36,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.44 | bwd_microstep: 5041.08 | bwd_inner_microstep: 5021.70 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3819 [2024-07-29 16:29:45,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.57 | bwd_microstep: 5055.72 | bwd_inner_microstep: 5034.81 | bwd_allreduce_microstep: 20.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-29 16:29:54,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3466.69 | bwd_microstep: 5106.11 | bwd_inner_microstep: 4708.82 | bwd_allreduce_microstep: 397.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-29 16:30:02,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.75 | bwd_microstep: 5099.01 | bwd_inner_microstep: 5052.66 | bwd_allreduce_microstep: 46.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 16:30:11,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.20 | bwd_microstep: 4987.21 | bwd_inner_microstep: 4967.85 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-29 16:30:20,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.56 | bwd_microstep: 4932.65 | bwd_inner_microstep: 4905.55 | bwd_allreduce_microstep: 27.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3732 [2024-07-29 16:30:29,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 16:30:29,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.03 | bwd_microstep: 5079.03 | bwd_inner_microstep: 5009.18 | bwd_allreduce_microstep: 69.78 | step_microstep: 180.71 [2024-07-29 16:30:29,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29163.27 | bwd: 40699.00 | bwd_inner: 40032.56 | bwd_allreduce: 665.95 | step: 181.29 37%|███▋ | 245/671 [4:47:15<8:16:44, 69.96s/it] {'loss': 1.1742, 'learning_rate': 1.4689973307237687e-05, 'epoch': 0.36} 37%|███▋ | 245/671 [4:47:15<8:16:44, 69.96s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4053 [2024-07-29 16:30:38,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3876.30 | bwd_microstep: 5348.41 | bwd_inner_microstep: 5329.30 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3858 [2024-07-29 16:30:47,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3795.21 | bwd_microstep: 5122.85 | bwd_inner_microstep: 5103.55 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2057 [2024-07-29 16:30:55,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3017.27 | bwd_microstep: 4882.29 | bwd_inner_microstep: 4505.90 | bwd_allreduce_microstep: 376.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-29 16:31:03,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.54 | bwd_microstep: 5152.76 | bwd_inner_microstep: 5082.61 | bwd_allreduce_microstep: 70.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-29 16:31:12,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.37 | bwd_microstep: 5186.10 | bwd_inner_microstep: 5128.91 | bwd_allreduce_microstep: 57.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-29 16:31:21,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.57 | bwd_microstep: 5195.79 | bwd_inner_microstep: 5111.18 | bwd_allreduce_microstep: 84.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-29 16:31:30,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.77 | bwd_microstep: 5051.52 | bwd_inner_microstep: 5009.89 | bwd_allreduce_microstep: 41.57 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-29 16:31:39,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 16:31:39,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.77 | bwd_microstep: 5066.60 | bwd_inner_microstep: 4675.13 | bwd_allreduce_microstep: 391.40 | step_microstep: 180.88 [2024-07-29 16:31:39,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28798.72 | bwd: 41006.30 | bwd_inner: 39946.41 | bwd_allreduce: 1059.43 | step: 181.47 37%|███▋ | 246/671 [4:48:25<8:15:56, 70.01s/it] {'loss': 1.1345, 'learning_rate': 1.4647231720437687e-05, 'epoch': 0.37} 37%|███▋ | 246/671 [4:48:25<8:15:56, 70.01s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2316 [2024-07-29 16:31:47,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3104.53 | bwd_microstep: 5156.87 | bwd_inner_microstep: 4764.27 | bwd_allreduce_microstep: 392.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2269 [2024-07-29 16:31:56,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.72 | bwd_microstep: 5246.59 | bwd_inner_microstep: 4839.35 | bwd_allreduce_microstep: 407.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2258 [2024-07-29 16:32:05,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.38 | bwd_microstep: 5229.48 | bwd_inner_microstep: 4823.54 | bwd_allreduce_microstep: 405.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-29 16:32:13,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.43 | bwd_microstep: 5168.11 | bwd_inner_microstep: 5085.75 | bwd_allreduce_microstep: 82.28 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-29 16:32:22,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.22 | bwd_microstep: 5179.75 | bwd_inner_microstep: 5102.65 | bwd_allreduce_microstep: 77.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-29 16:32:31,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.59 | bwd_microstep: 4897.36 | bwd_inner_microstep: 4878.02 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-29 16:32:39,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.63 | bwd_microstep: 4733.72 | bwd_inner_microstep: 4708.27 | bwd_allreduce_microstep: 25.38 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2139 [2024-07-29 16:32:48,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-07-29 16:32:48,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.90 | bwd_microstep: 5113.33 | bwd_inner_microstep: 4716.28 | bwd_allreduce_microstep: 396.99 | step_microstep: 181.06 [2024-07-29 16:32:48,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27905.31 | bwd: 40725.20 | bwd_inner: 38918.07 | bwd_allreduce: 1806.65 | step: 181.65 37%|███▋ | 247/671 [4:49:34<8:12:32, 69.70s/it] {'loss': 1.1472, 'learning_rate': 1.4604381574467616e-05, 'epoch': 0.37} 37%|███▋ | 247/671 [4:49:34<8:12:32, 69.70s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3934 [2024-07-29 16:32:57,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3821.03 | bwd_microstep: 5229.42 | bwd_inner_microstep: 5197.02 | bwd_allreduce_microstep: 32.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3812 [2024-07-29 16:33:06,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.34 | bwd_microstep: 5050.26 | bwd_inner_microstep: 5030.88 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3797 [2024-07-29 16:33:14,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.75 | bwd_microstep: 5199.93 | bwd_inner_microstep: 5146.93 | bwd_allreduce_microstep: 52.93 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2102 [2024-07-29 16:33:23,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.72 | bwd_microstep: 5169.48 | bwd_inner_microstep: 4768.47 | bwd_allreduce_microstep: 400.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-29 16:33:32,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.60 | bwd_microstep: 5094.07 | bwd_inner_microstep: 5044.75 | bwd_allreduce_microstep: 49.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-29 16:33:40,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.93 | bwd_microstep: 4804.07 | bwd_inner_microstep: 4784.70 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2109 [2024-07-29 16:33:49,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.15 | bwd_microstep: 5129.09 | bwd_inner_microstep: 4733.15 | bwd_allreduce_microstep: 395.87 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3666 [2024-07-29 16:33:58,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 16:33:58,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.56 | bwd_microstep: 5291.05 | bwd_inner_microstep: 5108.10 | bwd_allreduce_microstep: 182.89 | step_microstep: 182.40 [2024-07-29 16:33:58,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28754.97 | bwd: 40967.34 | bwd_inner: 39813.94 | bwd_allreduce: 1152.92 | step: 183.10 37%|███▋ | 248/671 [4:50:44<8:12:07, 69.80s/it] {'loss': 1.2398, 'learning_rate': 1.4561423870305385e-05, 'epoch': 0.37} 37%|███▋ | 248/671 [4:50:44<8:12:07, 69.80s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3959 [2024-07-29 16:34:07,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.56 | bwd_microstep: 5202.53 | bwd_inner_microstep: 5160.69 | bwd_allreduce_microstep: 41.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3858 [2024-07-29 16:34:16,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.82 | bwd_microstep: 5100.69 | bwd_inner_microstep: 5081.25 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3806 [2024-07-29 16:34:24,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.56 | bwd_microstep: 5149.12 | bwd_inner_microstep: 5083.63 | bwd_allreduce_microstep: 65.43 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2187 [2024-07-29 16:34:33,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.89 | bwd_microstep: 5222.83 | bwd_inner_microstep: 4817.36 | bwd_allreduce_microstep: 405.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-29 16:34:42,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.59 | bwd_microstep: 5151.16 | bwd_inner_microstep: 4752.13 | bwd_allreduce_microstep: 398.96 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2173 [2024-07-29 16:34:50,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2999.28 | bwd_microstep: 4875.61 | bwd_inner_microstep: 4498.66 | bwd_allreduce_microstep: 376.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-29 16:34:58,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.58 | bwd_microstep: 5096.58 | bwd_inner_microstep: 4698.91 | bwd_allreduce_microstep: 397.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-29 16:35:06,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 16:35:06,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3195.09 | bwd_microstep: 4723.35 | bwd_inner_microstep: 4696.66 | bwd_allreduce_microstep: 26.61 | step_microstep: 181.19 [2024-07-29 16:35:06,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27824.26 | bwd: 40521.85 | bwd_inner: 38789.23 | bwd_allreduce: 1732.14 | step: 181.74 37%|███▋ | 249/671 [4:51:52<8:08:34, 69.47s/it] {'loss': 1.1712, 'learning_rate': 1.4518359611441452e-05, 'epoch': 0.37} 37%|███▋ | 249/671 [4:51:52<8:08:34, 69.47s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3591 [2024-07-29 16:35:15,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.81 | bwd_microstep: 5274.76 | bwd_inner_microstep: 5180.14 | bwd_allreduce_microstep: 94.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2295 [2024-07-29 16:35:24,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.02 | bwd_microstep: 5195.37 | bwd_inner_microstep: 4792.43 | bwd_allreduce_microstep: 402.87 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2076 [2024-07-29 16:35:33,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.03 | bwd_microstep: 5265.90 | bwd_inner_microstep: 4860.54 | bwd_allreduce_microstep: 405.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2236 [2024-07-29 16:35:42,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.57 | bwd_microstep: 5156.59 | bwd_inner_microstep: 4753.43 | bwd_allreduce_microstep: 403.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-29 16:35:50,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3186.57 | bwd_microstep: 4747.84 | bwd_inner_microstep: 4714.42 | bwd_allreduce_microstep: 33.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-29 16:35:57,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3190.14 | bwd_microstep: 4688.56 | bwd_inner_microstep: 4669.17 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-29 16:36:06,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.78 | bwd_microstep: 4996.71 | bwd_inner_microstep: 4948.81 | bwd_allreduce_microstep: 47.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-29 16:36:15,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 16:36:15,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.28 | bwd_microstep: 5055.12 | bwd_inner_microstep: 4998.91 | bwd_allreduce_microstep: 56.14 | step_microstep: 181.07 [2024-07-29 16:36:15,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27788.08 | bwd: 40380.83 | bwd_inner: 38917.79 | bwd_allreduce: 1462.55 | step: 181.64 37%|███▋ | 250/671 [4:53:01<8:05:22, 69.17s/it] {'loss': 1.1982, 'learning_rate': 1.4475189803855399e-05, 'epoch': 0.37} 37%|███▋ | 250/671 [4:53:01<8:05:22, 69.17s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3897 [2024-07-29 16:36:24,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.52 | bwd_microstep: 5129.58 | bwd_inner_microstep: 5110.35 | bwd_allreduce_microstep: 19.15 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2235 [2024-07-29 16:36:33,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.43 | bwd_microstep: 5211.77 | bwd_inner_microstep: 4807.04 | bwd_allreduce_microstep: 404.67 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-29 16:36:41,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3463.80 | bwd_microstep: 5105.11 | bwd_inner_microstep: 4709.55 | bwd_allreduce_microstep: 395.49 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3747 [2024-07-29 16:36:50,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.89 | bwd_microstep: 5043.18 | bwd_inner_microstep: 4981.89 | bwd_allreduce_microstep: 61.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-29 16:36:59,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.45 | bwd_microstep: 5145.28 | bwd_inner_microstep: 5074.68 | bwd_allreduce_microstep: 70.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-29 16:37:07,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.74 | bwd_microstep: 5015.08 | bwd_inner_microstep: 4995.64 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2173 [2024-07-29 16:37:15,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3031.90 | bwd_microstep: 4926.55 | bwd_inner_microstep: 4546.66 | bwd_allreduce_microstep: 379.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-29 16:37:24,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.44 [2024-07-29 16:37:24,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.00 | bwd_microstep: 5033.94 | bwd_inner_microstep: 4977.60 | bwd_allreduce_microstep: 56.27 | step_microstep: 181.43 [2024-07-29 16:37:24,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28337.63 | bwd: 40610.47 | bwd_inner: 39203.35 | bwd_allreduce: 1406.63 | step: 182.02 37%|███▋ | 251/671 [4:54:10<8:04:25, 69.20s/it] {'loss': 1.1724, 'learning_rate': 1.4431915455992416e-05, 'epoch': 0.37} 37%|███▋ | 251/671 [4:54:10<8:04:25, 69.20s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3970 [2024-07-29 16:37:33,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3845.42 | bwd_microstep: 5291.52 | bwd_inner_microstep: 5272.31 | bwd_allreduce_microstep: 19.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3849 [2024-07-29 16:37:42,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.06 | bwd_microstep: 5216.29 | bwd_inner_microstep: 5165.46 | bwd_allreduce_microstep: 50.77 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-29 16:37:51,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.59 | bwd_microstep: 5161.55 | bwd_inner_microstep: 5110.71 | bwd_allreduce_microstep: 50.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-29 16:38:00,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.51 | bwd_microstep: 5074.64 | bwd_inner_microstep: 5016.29 | bwd_allreduce_microstep: 58.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-29 16:38:08,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.67 | bwd_microstep: 5227.76 | bwd_inner_microstep: 4822.13 | bwd_allreduce_microstep: 405.56 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-29 16:38:17,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.12 | bwd_microstep: 4937.21 | bwd_inner_microstep: 4905.35 | bwd_allreduce_microstep: 31.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-29 16:38:26,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.42 | bwd_microstep: 5041.20 | bwd_inner_microstep: 4983.37 | bwd_allreduce_microstep: 57.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-29 16:38:34,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 16:38:34,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.98 | bwd_microstep: 5089.53 | bwd_inner_microstep: 4693.52 | bwd_allreduce_microstep: 395.95 | step_microstep: 181.21 [2024-07-29 16:38:34,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28885.67 | bwd: 41039.67 | bwd_inner: 39969.09 | bwd_allreduce: 1070.11 | step: 181.89 38%|███▊ | 252/671 [4:55:20<8:05:28, 69.52s/it] {'loss': 1.1742, 'learning_rate': 1.438853757873975e-05, 'epoch': 0.38} 38%|███▊ | 252/671 [4:55:20<8:05:28, 69.52s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3580 [2024-07-29 16:38:43,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.71 | bwd_microstep: 5337.62 | bwd_inner_microstep: 5225.04 | bwd_allreduce_microstep: 112.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2229 [2024-07-29 16:38:52,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.69 | bwd_microstep: 5191.13 | bwd_inner_microstep: 4787.29 | bwd_allreduce_microstep: 403.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-29 16:39:01,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.38 | bwd_microstep: 4992.93 | bwd_inner_microstep: 4973.62 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-29 16:39:10,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.85 | bwd_microstep: 5118.47 | bwd_inner_microstep: 5048.50 | bwd_allreduce_microstep: 69.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-29 16:39:18,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.24 | bwd_microstep: 5055.03 | bwd_inner_microstep: 4987.60 | bwd_allreduce_microstep: 67.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-29 16:39:27,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.89 | bwd_microstep: 5121.87 | bwd_inner_microstep: 4725.97 | bwd_allreduce_microstep: 395.83 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3667 [2024-07-29 16:39:35,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3060.11 | bwd_microstep: 4691.05 | bwd_inner_microstep: 4664.32 | bwd_allreduce_microstep: 26.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-29 16:39:43,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-07-29 16:39:43,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.48 | bwd_microstep: 5106.23 | bwd_inner_microstep: 4709.56 | bwd_allreduce_microstep: 396.60 | step_microstep: 181.88 [2024-07-29 16:39:43,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28113.25 | bwd: 40614.32 | bwd_inner: 39121.85 | bwd_allreduce: 1492.00 | step: 182.46 38%|███▊ | 253/671 [4:56:29<8:03:20, 69.38s/it] {'loss': 1.199, 'learning_rate': 1.4345057185403098e-05, 'epoch': 0.38} 38%|███▊ | 253/671 [4:56:29<8:03:20, 69.38s/it]dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1841 [2024-07-29 16:39:53,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.17 | bwd_microstep: 5411.47 | bwd_inner_microstep: 4991.35 | bwd_allreduce_microstep: 420.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3575 [2024-07-29 16:40:01,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.66 | bwd_microstep: 5124.72 | bwd_inner_microstep: 5040.06 | bwd_allreduce_microstep: 84.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-29 16:40:10,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.63 | bwd_microstep: 5179.30 | bwd_inner_microstep: 5122.55 | bwd_allreduce_microstep: 56.68 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2211 [2024-07-29 16:40:18,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3019.97 | bwd_microstep: 4911.53 | bwd_inner_microstep: 4534.31 | bwd_allreduce_microstep: 377.16 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2089 [2024-07-29 16:40:27,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.84 | bwd_microstep: 5150.13 | bwd_inner_microstep: 4750.96 | bwd_allreduce_microstep: 399.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-29 16:40:35,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.43 | bwd_microstep: 5174.86 | bwd_inner_microstep: 5118.57 | bwd_allreduce_microstep: 56.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-29 16:40:44,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.23 | bwd_microstep: 4999.28 | bwd_inner_microstep: 4979.97 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2157 [2024-07-29 16:40:53,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 16:40:53,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3466.84 | bwd_microstep: 5046.92 | bwd_inner_microstep: 4656.27 | bwd_allreduce_microstep: 390.58 | step_microstep: 181.82 [2024-07-29 16:40:53,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28132.69 | bwd: 40998.19 | bwd_inner: 39193.97 | bwd_allreduce: 1803.75 | step: 182.39 38%|███▊ | 254/671 [4:57:39<8:02:20, 69.40s/it] {'loss': 1.1732, 'learning_rate': 1.430147529168292e-05, 'epoch': 0.38} 38%|███▊ | 254/671 [4:57:39<8:02:20, 69.40s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-29 16:41:02,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.47 | bwd_microstep: 5518.60 | bwd_inner_microstep: 5406.96 | bwd_allreduce_microstep: 111.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3562 [2024-07-29 16:41:11,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.93 | bwd_microstep: 5117.50 | bwd_inner_microstep: 5036.61 | bwd_allreduce_microstep: 80.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3760 [2024-07-29 16:41:19,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3246.42 | bwd_microstep: 4878.07 | bwd_inner_microstep: 4849.94 | bwd_allreduce_microstep: 28.06 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3810 [2024-07-29 16:41:28,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.25 | bwd_microstep: 5040.89 | bwd_inner_microstep: 5019.94 | bwd_allreduce_microstep: 20.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-29 16:41:36,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.01 | bwd_microstep: 4990.58 | bwd_inner_microstep: 4938.44 | bwd_allreduce_microstep: 52.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-29 16:41:45,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.63 | bwd_microstep: 5110.34 | bwd_inner_microstep: 5064.15 | bwd_allreduce_microstep: 46.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-29 16:41:54,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.78 | bwd_microstep: 5078.49 | bwd_inner_microstep: 5017.72 | bwd_allreduce_microstep: 60.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-29 16:42:02,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-29 16:42:02,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.12 | bwd_microstep: 4920.78 | bwd_inner_microstep: 4881.42 | bwd_allreduce_microstep: 39.29 | step_microstep: 181.01 [2024-07-29 16:42:02,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28551.52 | bwd: 40655.23 | bwd_inner: 40215.12 | bwd_allreduce: 439.65 | step: 181.59 38%|███▊ | 255/671 [4:58:48<8:01:28, 69.44s/it] {'loss': 1.2167, 'learning_rate': 1.4257792915650728e-05, 'epoch': 0.38} 38%|███▊ | 255/671 [4:58:48<8:01:28, 69.44s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3558 [2024-07-29 16:42:12,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.34 | bwd_microstep: 5393.69 | bwd_inner_microstep: 5243.02 | bwd_allreduce_microstep: 150.60 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2231 [2024-07-29 16:42:20,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.59 | bwd_microstep: 5272.83 | bwd_inner_microstep: 4865.35 | bwd_allreduce_microstep: 407.42 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3625 [2024-07-29 16:42:29,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.75 | bwd_microstep: 5181.16 | bwd_inner_microstep: 5090.16 | bwd_allreduce_microstep: 90.94 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3773 [2024-07-29 16:42:38,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.23 | bwd_microstep: 5088.76 | bwd_inner_microstep: 5018.22 | bwd_allreduce_microstep: 70.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-29 16:42:46,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.66 | bwd_microstep: 5006.78 | bwd_inner_microstep: 4948.94 | bwd_allreduce_microstep: 57.78 | step_microstep: 0.18 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3758 [2024-07-29 16:42:55,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.81 | bwd_microstep: 5168.97 | bwd_inner_microstep: 5100.47 | bwd_allreduce_microstep: 68.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-29 16:43:04,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.53 | bwd_microstep: 5014.60 | bwd_inner_microstep: 4989.65 | bwd_allreduce_microstep: 24.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-29 16:43:13,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.77 [2024-07-29 16:43:13,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.33 | bwd_microstep: 4918.78 | bwd_inner_microstep: 4895.54 | bwd_allreduce_microstep: 23.16 | step_microstep: 182.34 [2024-07-29 16:43:13,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29060.16 | bwd: 41045.55 | bwd_inner: 40151.28 | bwd_allreduce: 893.80 | step: 183.02 38%|███▊ | 256/671 [4:59:59<8:02:22, 69.74s/it] {'loss': 1.1388, 'learning_rate': 1.4214011077725291e-05, 'epoch': 0.38} 38%|███▊ | 256/671 [4:59:59<8:02:22, 69.74s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3984 [2024-07-29 16:43:22,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.74 | bwd_microstep: 5108.31 | bwd_inner_microstep: 5089.20 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2312 [2024-07-29 16:43:31,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.88 | bwd_microstep: 5234.27 | bwd_inner_microstep: 4828.48 | bwd_allreduce_microstep: 405.72 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-29 16:43:39,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.21 | bwd_microstep: 5111.80 | bwd_inner_microstep: 5040.33 | bwd_allreduce_microstep: 71.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2235 [2024-07-29 16:43:48,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.35 | bwd_microstep: 5157.99 | bwd_inner_microstep: 4754.01 | bwd_allreduce_microstep: 403.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3756 [2024-07-29 16:43:57,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.09 | bwd_microstep: 5126.46 | bwd_inner_microstep: 5055.96 | bwd_allreduce_microstep: 70.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-29 16:44:06,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.67 | bwd_microstep: 5056.11 | bwd_inner_microstep: 5028.67 | bwd_allreduce_microstep: 27.37 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2179 [2024-07-29 16:44:14,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.47 | bwd_microstep: 5238.32 | bwd_inner_microstep: 4830.08 | bwd_allreduce_microstep: 408.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-29 16:44:23,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 16:44:23,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.38 | bwd_microstep: 5236.21 | bwd_inner_microstep: 4828.94 | bwd_allreduce_microstep: 407.21 | step_microstep: 180.80 [2024-07-29 16:44:23,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28822.69 | bwd: 41269.44 | bwd_inner: 39455.62 | bwd_allreduce: 1813.36 | step: 181.39 38%|███▊ | 257/671 [5:01:09<8:02:36, 69.94s/it] {'loss': 1.2166, 'learning_rate': 1.4170130800648814e-05, 'epoch': 0.38} 38%|███▊ | 257/671 [5:01:09<8:02:36, 69.94s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3571 [2024-07-29 16:44:32,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.36 | bwd_microstep: 5185.83 | bwd_inner_microstep: 5097.38 | bwd_allreduce_microstep: 88.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3786 [2024-07-29 16:44:41,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.11 | bwd_microstep: 5164.51 | bwd_inner_microstep: 5113.01 | bwd_allreduce_microstep: 51.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3814 [2024-07-29 16:44:50,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.14 | bwd_microstep: 5050.02 | bwd_inner_microstep: 5030.69 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2235 [2024-07-29 16:44:58,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.87 | bwd_microstep: 5176.12 | bwd_inner_microstep: 4775.95 | bwd_allreduce_microstep: 400.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-29 16:45:07,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.32 | bwd_microstep: 5185.13 | bwd_inner_microstep: 5098.95 | bwd_allreduce_microstep: 86.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-29 16:45:16,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.85 | bwd_microstep: 5092.69 | bwd_inner_microstep: 5027.47 | bwd_allreduce_microstep: 65.15 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-29 16:45:25,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.21 | bwd_microstep: 5032.42 | bwd_inner_microstep: 4978.58 | bwd_allreduce_microstep: 53.77 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 3152 [2024-07-29 16:45:33,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.39 [2024-07-29 16:45:33,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3067.55 | bwd_microstep: 4887.70 | bwd_inner_microstep: 4716.01 | bwd_allreduce_microstep: 171.62 | step_microstep: 181.04 [2024-07-29 16:45:33,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28358.34 | bwd: 40774.41 | bwd_inner: 39837.99 | bwd_allreduce: 935.95 | step: 181.60 38%|███▊ | 258/671 [5:02:19<8:00:26, 69.80s/it] {'loss': 1.1451, 'learning_rate': 1.4126153109463025e-05, 'epoch': 0.38} 38%|███▊ | 258/671 [5:02:19<8:00:26, 69.80s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4042 [2024-07-29 16:45:42,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.21 | bwd_microstep: 5149.73 | bwd_inner_microstep: 5130.62 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3814 [2024-07-29 16:45:50,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.77 | bwd_microstep: 5131.79 | bwd_inner_microstep: 5088.23 | bwd_allreduce_microstep: 43.49 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3615 [2024-07-29 16:45:59,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.05 | bwd_microstep: 5101.94 | bwd_inner_microstep: 5021.62 | bwd_allreduce_microstep: 80.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-29 16:46:08,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.77 | bwd_microstep: 5005.57 | bwd_inner_microstep: 4986.21 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2218 [2024-07-29 16:46:17,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.19 | bwd_microstep: 5179.25 | bwd_inner_microstep: 4776.50 | bwd_allreduce_microstep: 402.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-29 16:46:25,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.49 | bwd_microstep: 5024.53 | bwd_inner_microstep: 4962.62 | bwd_allreduce_microstep: 61.85 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-29 16:46:34,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.21 | bwd_microstep: 5180.17 | bwd_inner_microstep: 5103.01 | bwd_allreduce_microstep: 77.10 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3655 [2024-07-29 16:46:42,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-29 16:46:42,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3111.96 | bwd_microstep: 4970.30 | bwd_inner_microstep: 4910.14 | bwd_allreduce_microstep: 60.09 | step_microstep: 181.44 [2024-07-29 16:46:42,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28462.56 | bwd: 40743.26 | bwd_inner: 39978.87 | bwd_allreduce: 763.92 | step: 182.05 39%|███▊ | 259/671 [5:03:28<7:58:44, 69.72s/it] {'loss': 1.1513, 'learning_rate': 1.4082079031485253e-05, 'epoch': 0.39} 39%|███▊ | 259/671 [5:03:28<7:58:44, 69.72s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3920 [2024-07-29 16:46:51,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.15 | bwd_microstep: 5278.36 | bwd_inner_microstep: 5231.69 | bwd_allreduce_microstep: 46.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3889 [2024-07-29 16:47:00,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.16 | bwd_microstep: 5289.39 | bwd_inner_microstep: 5232.36 | bwd_allreduce_microstep: 56.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3788 [2024-07-29 16:47:09,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.70 | bwd_microstep: 5026.05 | bwd_inner_microstep: 5006.69 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3813 [2024-07-29 16:47:18,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.60 | bwd_microstep: 5047.97 | bwd_inner_microstep: 5028.56 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-29 16:47:26,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3266.71 | bwd_microstep: 4871.66 | bwd_inner_microstep: 4842.10 | bwd_allreduce_microstep: 29.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-29 16:47:35,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.86 | bwd_microstep: 4952.22 | bwd_inner_microstep: 4918.90 | bwd_allreduce_microstep: 33.25 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2157 [2024-07-29 16:47:43,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3026.61 | bwd_microstep: 4896.66 | bwd_inner_microstep: 4521.08 | bwd_allreduce_microstep: 375.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-29 16:47:51,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 16:47:51,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.12 | bwd_microstep: 5060.83 | bwd_inner_microstep: 4668.60 | bwd_allreduce_microstep: 392.17 | step_microstep: 186.90 [2024-07-29 16:47:51,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28234.81 | bwd: 40423.13 | bwd_inner: 39449.93 | bwd_allreduce: 972.73 | step: 187.57 39%|███▊ | 260/671 [5:04:37<7:56:04, 69.50s/it] {'loss': 1.1704, 'learning_rate': 1.4037909596284411e-05, 'epoch': 0.39} 39%|███▊ | 260/671 [5:04:37<7:56:04, 69.50s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2420 [2024-07-29 16:47:57,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2258.19 | bwd_microstep: 3284.07 | bwd_inner_microstep: 3264.70 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3823 [2024-07-29 16:48:06,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.63 | bwd_microstep: 5038.01 | bwd_inner_microstep: 5018.72 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-29 16:48:14,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.15 | bwd_microstep: 5162.15 | bwd_inner_microstep: 5085.66 | bwd_allreduce_microstep: 76.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-29 16:48:23,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3236.80 | bwd_microstep: 4866.53 | bwd_inner_microstep: 4817.96 | bwd_allreduce_microstep: 48.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-29 16:48:31,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.20 | bwd_microstep: 5134.05 | bwd_inner_microstep: 5058.71 | bwd_allreduce_microstep: 75.29 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3675 [2024-07-29 16:48:40,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.15 | bwd_microstep: 5031.59 | bwd_inner_microstep: 4958.67 | bwd_allreduce_microstep: 72.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-29 16:48:49,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.86 | bwd_microstep: 5001.40 | bwd_inner_microstep: 4982.13 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-29 16:48:58,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 16:48:58,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.02 | bwd_microstep: 5097.26 | bwd_inner_microstep: 5033.46 | bwd_allreduce_microstep: 63.74 | step_microstep: 181.55 [2024-07-29 16:48:58,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27351.90 | bwd: 38615.04 | bwd_inner: 38219.95 | bwd_allreduce: 394.62 | step: 182.12 39%|███▉ | 261/671 [5:05:44<7:48:20, 68.54s/it] {'loss': 1.1971, 'learning_rate': 1.3993645835656957e-05, 'epoch': 0.39} 39%|███▉ | 261/671 [5:05:44<7:48:20, 68.54s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3780 [2024-07-29 16:49:06,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.61 | bwd_microstep: 5170.35 | bwd_inner_microstep: 5101.35 | bwd_allreduce_microstep: 68.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3823 [2024-07-29 16:49:15,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.18 | bwd_microstep: 5060.53 | bwd_inner_microstep: 5041.24 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2214 [2024-07-29 16:49:23,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3071.59 | bwd_microstep: 5064.40 | bwd_inner_microstep: 4675.42 | bwd_allreduce_microstep: 388.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-29 16:49:31,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.95 | bwd_microstep: 4815.54 | bwd_inner_microstep: 4793.85 | bwd_allreduce_microstep: 21.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-29 16:49:40,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3243.10 | bwd_microstep: 4796.33 | bwd_inner_microstep: 4776.95 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-29 16:49:48,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.32 | bwd_microstep: 5004.06 | bwd_inner_microstep: 4980.67 | bwd_allreduce_microstep: 23.33 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2179 [2024-07-29 16:49:57,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.90 | bwd_microstep: 5149.74 | bwd_inner_microstep: 4749.93 | bwd_allreduce_microstep: 399.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-29 16:50:05,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 16:50:05,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3184.66 | bwd_microstep: 4694.70 | bwd_inner_microstep: 4675.36 | bwd_allreduce_microstep: 19.28 | step_microstep: 181.00 [2024-07-29 16:50:05,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27373.23 | bwd: 39755.64 | bwd_inner: 38794.72 | bwd_allreduce: 960.45 | step: 181.56 39%|███▉ | 262/671 [5:06:51<7:44:59, 68.21s/it] {'loss': 1.2009, 'learning_rate': 1.394928878360279e-05, 'epoch': 0.39} 39%|███▉ | 262/671 [5:06:51<7:44:59, 68.21s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3870 [2024-07-29 16:50:14,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.90 | bwd_microstep: 5286.77 | bwd_inner_microstep: 5205.35 | bwd_allreduce_microstep: 81.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3764 [2024-07-29 16:50:22,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3257.95 | bwd_microstep: 4883.21 | bwd_inner_microstep: 4855.45 | bwd_allreduce_microstep: 27.69 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2237 [2024-07-29 16:50:31,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3469.56 | bwd_microstep: 5132.60 | bwd_inner_microstep: 4735.83 | bwd_allreduce_microstep: 396.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3803 [2024-07-29 16:50:40,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.48 | bwd_microstep: 5179.35 | bwd_inner_microstep: 5127.43 | bwd_allreduce_microstep: 51.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-29 16:50:48,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.49 | bwd_microstep: 5095.28 | bwd_inner_microstep: 5027.92 | bwd_allreduce_microstep: 67.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 16:50:57,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.05 | bwd_microstep: 5032.52 | bwd_inner_microstep: 5007.74 | bwd_allreduce_microstep: 24.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-29 16:51:06,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.49 | bwd_microstep: 4937.78 | bwd_inner_microstep: 4911.81 | bwd_allreduce_microstep: 25.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-29 16:51:15,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 16:51:15,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.14 | bwd_microstep: 4998.68 | bwd_inner_microstep: 4949.75 | bwd_allreduce_microstep: 48.86 | step_microstep: 182.10 [2024-07-29 16:51:15,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28617.98 | bwd: 40546.17 | bwd_inner: 39821.22 | bwd_allreduce: 724.48 | step: 182.68 39%|███▉ | 263/671 [5:08:01<7:46:27, 68.60s/it] {'loss': 1.2543, 'learning_rate': 1.3904839476301088e-05, 'epoch': 0.39} 39%|███▉ | 263/671 [5:08:01<7:46:27, 68.60s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3905 [2024-07-29 16:51:24,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3811.48 | bwd_microstep: 5175.10 | bwd_inner_microstep: 5154.10 | bwd_allreduce_microstep: 20.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2285 [2024-07-29 16:51:32,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.73 | bwd_microstep: 5254.73 | bwd_inner_microstep: 4847.55 | bwd_allreduce_microstep: 407.12 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3787 [2024-07-29 16:51:41,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.89 | bwd_microstep: 5128.69 | bwd_inner_microstep: 5083.55 | bwd_allreduce_microstep: 45.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3634 [2024-07-29 16:51:50,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.22 | bwd_microstep: 5173.21 | bwd_inner_microstep: 5080.57 | bwd_allreduce_microstep: 92.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-29 16:51:59,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.28 | bwd_microstep: 5010.41 | bwd_inner_microstep: 4973.36 | bwd_allreduce_microstep: 36.98 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-29 16:52:07,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.34 | bwd_microstep: 5219.95 | bwd_inner_microstep: 4816.05 | bwd_allreduce_microstep: 403.83 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3673 [2024-07-29 16:52:16,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.57 | bwd_microstep: 5057.69 | bwd_inner_microstep: 4988.52 | bwd_allreduce_microstep: 69.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-29 16:52:25,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-29 16:52:25,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.38 | bwd_microstep: 4981.79 | bwd_inner_microstep: 4933.70 | bwd_allreduce_microstep: 48.03 | step_microstep: 180.82 [2024-07-29 16:52:25,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28834.78 | bwd: 41001.54 | bwd_inner: 39877.35 | bwd_allreduce: 1123.73 | step: 181.49 39%|███▉ | 264/671 [5:09:11<7:48:31, 69.07s/it] {'loss': 1.1814, 'learning_rate': 1.3860298952086115e-05, 'epoch': 0.39} 39%|███▉ | 264/671 [5:09:11<7:48:31, 69.07s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2362 [2024-07-29 16:52:34,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.29 | bwd_microstep: 5943.19 | bwd_inner_microstep: 5517.04 | bwd_allreduce_microstep: 426.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3894 [2024-07-29 16:52:43,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.91 | bwd_microstep: 5175.58 | bwd_inner_microstep: 5131.66 | bwd_allreduce_microstep: 43.85 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3614 [2024-07-29 16:52:52,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.89 | bwd_microstep: 5220.05 | bwd_inner_microstep: 5124.54 | bwd_allreduce_microstep: 95.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3736 [2024-07-29 16:53:01,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.95 | bwd_microstep: 5103.24 | bwd_inner_microstep: 5059.99 | bwd_allreduce_microstep: 43.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2222 [2024-07-29 16:53:10,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.52 | bwd_microstep: 5238.80 | bwd_inner_microstep: 4830.47 | bwd_allreduce_microstep: 408.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-29 16:53:18,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.20 | bwd_microstep: 5207.40 | bwd_inner_microstep: 5129.62 | bwd_allreduce_microstep: 77.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-29 16:53:27,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.62 | bwd_microstep: 5168.85 | bwd_inner_microstep: 5095.84 | bwd_allreduce_microstep: 72.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-29 16:53:36,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 16:53:36,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.51 | bwd_microstep: 5028.46 | bwd_inner_microstep: 4974.04 | bwd_allreduce_microstep: 54.35 | step_microstep: 180.59 [2024-07-29 16:53:36,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28873.78 | bwd: 42085.55 | bwd_inner: 40863.13 | bwd_allreduce: 1221.95 | step: 181.18 39%|███▉ | 265/671 [5:10:22<7:51:53, 69.74s/it] {'loss': 1.1364, 'learning_rate': 1.3815668251422953e-05, 'epoch': 0.39} 39%|███▉ | 265/671 [5:10:22<7:51:53, 69.74s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-29 16:53:45,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.77 | bwd_microstep: 5190.47 | bwd_inner_microstep: 5117.39 | bwd_allreduce_microstep: 73.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2269 [2024-07-29 16:53:54,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.43 | bwd_microstep: 5268.03 | bwd_inner_microstep: 4858.27 | bwd_allreduce_microstep: 409.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-29 16:54:03,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.43 | bwd_microstep: 5018.68 | bwd_inner_microstep: 4994.71 | bwd_allreduce_microstep: 23.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-29 16:54:11,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.12 | bwd_microstep: 5124.12 | bwd_inner_microstep: 5047.58 | bwd_allreduce_microstep: 76.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-29 16:54:20,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.31 | bwd_microstep: 5004.76 | bwd_inner_microstep: 4982.58 | bwd_allreduce_microstep: 22.11 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-29 16:54:29,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.76 | bwd_microstep: 5140.75 | bwd_inner_microstep: 5086.47 | bwd_allreduce_microstep: 54.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-29 16:54:37,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.76 | bwd_microstep: 5035.05 | bwd_inner_microstep: 4979.20 | bwd_allreduce_microstep: 55.79 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2130 [2024-07-29 16:54:47,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 16:54:47,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.10 | bwd_microstep: 5120.39 | bwd_inner_microstep: 4721.61 | bwd_allreduce_microstep: 398.71 | step_microstep: 481.03 [2024-07-29 16:54:47,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29073.59 | bwd: 40902.23 | bwd_inner: 39787.74 | bwd_allreduce: 1114.02 | step: 481.60 40%|███▉ | 266/671 [5:11:33<7:52:28, 70.00s/it] {'loss': 1.194, 'learning_rate': 1.3770948416883205e-05, 'epoch': 0.4} 40%|███▉ | 266/671 [5:11:33<7:52:28, 70.00s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 1992 [2024-07-29 16:54:55,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.98 | bwd_microstep: 5231.62 | bwd_inner_microstep: 4827.87 | bwd_allreduce_microstep: 403.68 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2299 [2024-07-29 16:55:04,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.90 | bwd_microstep: 5203.11 | bwd_inner_microstep: 4799.64 | bwd_allreduce_microstep: 403.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3760 [2024-07-29 16:55:13,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.46 | bwd_microstep: 5166.60 | bwd_inner_microstep: 5085.86 | bwd_allreduce_microstep: 80.67 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3773 [2024-07-29 16:55:22,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3941.62 | bwd_microstep: 5094.66 | bwd_inner_microstep: 5064.71 | bwd_allreduce_microstep: 29.88 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3765 [2024-07-29 16:55:31,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.62 | bwd_microstep: 5004.38 | bwd_inner_microstep: 4985.04 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-29 16:55:39,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3191.23 | bwd_microstep: 4712.21 | bwd_inner_microstep: 4688.42 | bwd_allreduce_microstep: 23.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3666 [2024-07-29 16:55:48,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.93 | bwd_microstep: 5207.92 | bwd_inner_microstep: 5119.69 | bwd_allreduce_microstep: 88.17 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-29 16:55:56,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 16:55:56,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3212.89 | bwd_microstep: 4718.55 | bwd_inner_microstep: 4695.22 | bwd_allreduce_microstep: 23.26 | step_microstep: 765.12 [2024-07-29 16:55:56,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28414.52 | bwd: 40339.02 | bwd_inner: 39266.39 | bwd_allreduce: 1072.15 | step: 765.72 40%|███▉ | 267/671 [5:12:42<7:50:38, 69.90s/it] {'loss': 1.1886, 'learning_rate': 1.3726140493120639e-05, 'epoch': 0.4} 40%|███▉ | 267/671 [5:12:42<7:50:38, 69.90s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3959 [2024-07-29 16:56:05,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3851.01 | bwd_microstep: 5201.36 | bwd_inner_microstep: 5182.23 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3827 [2024-07-29 16:56:14,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3790.19 | bwd_microstep: 5169.07 | bwd_inner_microstep: 5135.62 | bwd_allreduce_microstep: 33.38 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-29 16:56:23,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.50 | bwd_microstep: 5250.66 | bwd_inner_microstep: 4843.41 | bwd_allreduce_microstep: 407.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3785 [2024-07-29 16:56:32,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.01 | bwd_microstep: 5411.87 | bwd_inner_microstep: 5369.66 | bwd_allreduce_microstep: 42.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3638 [2024-07-29 16:56:41,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.31 | bwd_microstep: 5107.08 | bwd_inner_microstep: 5009.58 | bwd_allreduce_microstep: 97.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-29 16:56:50,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.70 | bwd_microstep: 5162.57 | bwd_inner_microstep: 4761.09 | bwd_allreduce_microstep: 401.41 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-29 16:56:58,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.31 | bwd_microstep: 4946.04 | bwd_inner_microstep: 4914.09 | bwd_allreduce_microstep: 31.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3688 [2024-07-29 16:57:07,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 16:57:07,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.04 | bwd_microstep: 5064.18 | bwd_inner_microstep: 4977.94 | bwd_allreduce_microstep: 86.18 | step_microstep: 519.13 [2024-07-29 16:57:07,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29129.98 | bwd: 41312.81 | bwd_inner: 40193.56 | bwd_allreduce: 1118.78 | step: 519.81 40%|███▉ | 268/671 [5:13:53<7:51:54, 70.26s/it] {'loss': 1.1643, 'learning_rate': 1.3681245526846782e-05, 'epoch': 0.4} 40%|███▉ | 268/671 [5:13:53<7:51:54, 70.26s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2009 [2024-07-29 16:57:16,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.99 | bwd_microstep: 5260.47 | bwd_inner_microstep: 4853.51 | bwd_allreduce_microstep: 406.88 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 3351 [2024-07-29 16:57:25,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.76 | bwd_microstep: 5317.08 | bwd_inner_microstep: 5052.40 | bwd_allreduce_microstep: 264.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3776 [2024-07-29 16:57:33,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3238.13 | bwd_microstep: 4840.94 | bwd_inner_microstep: 4816.28 | bwd_allreduce_microstep: 24.60 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3816 [2024-07-29 16:57:42,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.30 | bwd_microstep: 5034.07 | bwd_inner_microstep: 5014.67 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3772 [2024-07-29 16:57:51,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.82 | bwd_microstep: 5034.60 | bwd_inner_microstep: 5011.63 | bwd_allreduce_microstep: 22.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3769 [2024-07-29 16:58:00,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.30 | bwd_microstep: 5098.55 | bwd_inner_microstep: 5056.22 | bwd_allreduce_microstep: 42.27 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3684 [2024-07-29 16:58:08,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.37 | bwd_microstep: 4884.42 | bwd_inner_microstep: 4864.96 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3722 [2024-07-29 16:58:17,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 16:58:17,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.69 | bwd_microstep: 5046.13 | bwd_inner_microstep: 5004.68 | bwd_allreduce_microstep: 41.39 | step_microstep: 180.95 [2024-07-29 16:58:17,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28754.26 | bwd: 40516.24 | bwd_inner: 39674.31 | bwd_allreduce: 841.46 | step: 181.53 40%|████ | 269/671 [5:15:03<7:49:25, 70.06s/it] {'loss': 1.1556, 'learning_rate': 1.3636264566806473e-05, 'epoch': 0.4} 40%|████ | 269/671 [5:15:03<7:49:25, 70.06s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2336 [2024-07-29 16:58:26,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.67 | bwd_microstep: 5382.61 | bwd_inner_microstep: 4967.13 | bwd_allreduce_microstep: 415.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3794 [2024-07-29 16:58:35,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.69 | bwd_microstep: 5186.88 | bwd_inner_microstep: 5136.22 | bwd_allreduce_microstep: 50.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-29 16:58:44,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.65 | bwd_microstep: 5139.72 | bwd_inner_microstep: 5064.25 | bwd_allreduce_microstep: 75.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-29 16:58:52,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.91 | bwd_microstep: 4988.09 | bwd_inner_microstep: 4968.74 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-29 16:59:01,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3810.36 | bwd_microstep: 4970.03 | bwd_inner_microstep: 4940.96 | bwd_allreduce_microstep: 29.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-29 16:59:10,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.36 | bwd_microstep: 5251.72 | bwd_inner_microstep: 4844.95 | bwd_allreduce_microstep: 406.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-29 16:59:19,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.03 | bwd_microstep: 4923.82 | bwd_inner_microstep: 4898.98 | bwd_allreduce_microstep: 24.78 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3663 [2024-07-29 16:59:28,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-07-29 16:59:28,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.24 | bwd_microstep: 5029.56 | bwd_inner_microstep: 4965.90 | bwd_allreduce_microstep: 63.59 | step_microstep: 792.81 [2024-07-29 16:59:28,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29293.82 | bwd: 40872.41 | bwd_inner: 39787.08 | bwd_allreduce: 1084.87 | step: 793.37 40%|████ | 270/671 [5:16:14<7:50:21, 70.38s/it] {'loss': 1.1988, 'learning_rate': 1.3591198663753358e-05, 'epoch': 0.4} 40%|████ | 270/671 [5:16:14<7:50:21, 70.38s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3937 [2024-07-29 16:59:37,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.35 | bwd_microstep: 5230.55 | bwd_inner_microstep: 5187.30 | bwd_allreduce_microstep: 43.18 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3800 [2024-07-29 16:59:46,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.88 | bwd_microstep: 5115.09 | bwd_inner_microstep: 5070.24 | bwd_allreduce_microstep: 44.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2325 [2024-07-29 16:59:54,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.02 | bwd_microstep: 5162.56 | bwd_inner_microstep: 4760.64 | bwd_allreduce_microstep: 401.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-29 17:00:03,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.00 | bwd_microstep: 5107.81 | bwd_inner_microstep: 5033.94 | bwd_allreduce_microstep: 73.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-29 17:00:12,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.48 | bwd_microstep: 5186.12 | bwd_inner_microstep: 5106.69 | bwd_allreduce_microstep: 79.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-29 17:00:21,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.48 | bwd_microstep: 5094.97 | bwd_inner_microstep: 5037.25 | bwd_allreduce_microstep: 57.65 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-29 17:00:29,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.66 | bwd_microstep: 5021.08 | bwd_inner_microstep: 4959.33 | bwd_allreduce_microstep: 61.69 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-29 17:00:38,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-07-29 17:00:38,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.91 | bwd_microstep: 5063.76 | bwd_inner_microstep: 4671.84 | bwd_allreduce_microstep: 391.85 | step_microstep: 181.48 [2024-07-29 17:00:38,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28521.69 | bwd: 40981.93 | bwd_inner: 39827.18 | bwd_allreduce: 1154.28 | step: 182.09 40%|████ | 271/671 [5:17:24<7:48:05, 70.21s/it] {'loss': 1.1777, 'learning_rate': 1.354604887042536e-05, 'epoch': 0.4} 40%|████ | 271/671 [5:17:24<7:48:05, 70.21s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-29 17:00:47,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.57 | bwd_microstep: 5244.65 | bwd_inner_microstep: 5158.21 | bwd_allreduce_microstep: 86.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2255 [2024-07-29 17:00:56,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.55 | bwd_microstep: 5213.24 | bwd_inner_microstep: 4807.21 | bwd_allreduce_microstep: 405.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-29 17:01:04,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.08 | bwd_microstep: 5169.47 | bwd_inner_microstep: 5094.20 | bwd_allreduce_microstep: 75.20 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-29 17:01:13,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.80 | bwd_microstep: 5097.87 | bwd_inner_microstep: 5026.76 | bwd_allreduce_microstep: 71.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-29 17:01:22,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.69 | bwd_microstep: 5171.46 | bwd_inner_microstep: 5098.43 | bwd_allreduce_microstep: 72.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3768 [2024-07-29 17:01:31,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.03 | bwd_microstep: 5169.10 | bwd_inner_microstep: 5113.99 | bwd_allreduce_microstep: 55.04 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3670 [2024-07-29 17:01:39,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.48 | bwd_microstep: 4977.66 | bwd_inner_microstep: 4915.82 | bwd_allreduce_microstep: 61.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-29 17:01:48,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 17:01:48,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.66 | bwd_microstep: 5368.74 | bwd_inner_microstep: 4868.99 | bwd_allreduce_microstep: 499.68 | step_microstep: 208.46 [2024-07-29 17:01:48,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28596.77 | bwd: 41412.14 | bwd_inner: 40083.55 | bwd_allreduce: 1328.13 | step: 209.13 41%|████ | 272/671 [5:18:34<7:47:13, 70.26s/it] {'loss': 1.1831, 'learning_rate': 1.3500816241520059e-05, 'epoch': 0.4} 41%|████ | 272/671 [5:18:34<7:47:13, 70.26s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2332 [2024-07-29 17:01:57,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.25 | bwd_microstep: 5235.09 | bwd_inner_microstep: 4829.91 | bwd_allreduce_microstep: 405.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3873 [2024-07-29 17:02:05,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3251.22 | bwd_microstep: 4909.18 | bwd_inner_microstep: 4889.84 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2279 [2024-07-29 17:02:13,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3050.17 | bwd_microstep: 5016.96 | bwd_inner_microstep: 4630.14 | bwd_allreduce_microstep: 386.76 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 692 [2024-07-29 17:02:22,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.36 | bwd_microstep: 5177.20 | bwd_inner_microstep: 4778.53 | bwd_allreduce_microstep: 398.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-29 17:02:31,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.36 | bwd_microstep: 5143.71 | bwd_inner_microstep: 5089.62 | bwd_allreduce_microstep: 54.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-29 17:02:40,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.61 | bwd_microstep: 5437.77 | bwd_inner_microstep: 5377.98 | bwd_allreduce_microstep: 59.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-07-29 17:02:49,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.43 | bwd_microstep: 4996.48 | bwd_inner_microstep: 4977.06 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-29 17:02:58,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 17:02:58,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.92 | bwd_microstep: 4967.60 | bwd_inner_microstep: 4937.73 | bwd_allreduce_microstep: 29.80 | step_microstep: 482.54 [2024-07-29 17:02:58,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27866.23 | bwd: 40883.96 | bwd_inner: 39510.75 | bwd_allreduce: 1372.74 | step: 483.11 41%|████ | 273/671 [5:19:44<7:44:17, 69.99s/it] {'loss': 1.1756, 'learning_rate': 1.3455501833670089e-05, 'epoch': 0.41} 41%|████ | 273/671 [5:19:44<7:44:17, 69.99s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3958 [2024-07-29 17:03:07,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.53 | bwd_microstep: 5207.63 | bwd_inner_microstep: 5170.15 | bwd_allreduce_microstep: 37.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3794 [2024-07-29 17:03:15,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3122.33 | bwd_microstep: 4963.47 | bwd_inner_microstep: 4922.28 | bwd_allreduce_microstep: 41.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-29 17:03:24,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.36 | bwd_microstep: 5371.19 | bwd_inner_microstep: 5316.28 | bwd_allreduce_microstep: 54.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-29 17:03:32,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.00 | bwd_microstep: 5206.01 | bwd_inner_microstep: 4800.75 | bwd_allreduce_microstep: 405.19 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2104 [2024-07-29 17:03:41,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3487.90 | bwd_microstep: 5092.05 | bwd_inner_microstep: 4694.42 | bwd_allreduce_microstep: 397.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-29 17:03:50,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.80 | bwd_microstep: 5032.49 | bwd_inner_microstep: 4974.69 | bwd_allreduce_microstep: 57.74 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2141 [2024-07-29 17:03:58,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.47 | bwd_microstep: 5064.26 | bwd_inner_microstep: 4672.16 | bwd_allreduce_microstep: 392.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-29 17:04:07,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 17:04:07,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.61 | bwd_microstep: 4976.71 | bwd_inner_microstep: 4957.29 | bwd_allreduce_microstep: 19.34 | step_microstep: 180.84 [2024-07-29 17:04:07,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28237.91 | bwd: 40913.80 | bwd_inner: 39507.97 | bwd_allreduce: 1405.37 | step: 181.42 41%|████ | 274/671 [5:20:53<7:42:05, 69.84s/it] {'loss': 1.1909, 'learning_rate': 1.3410106705418424e-05, 'epoch': 0.41} 41%|████ | 274/671 [5:20:53<7:42:05, 69.84s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3897 [2024-07-29 17:04:16,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3887.36 | bwd_microstep: 5324.63 | bwd_inner_microstep: 5266.52 | bwd_allreduce_microstep: 58.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3883 [2024-07-29 17:04:25,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3797.49 | bwd_microstep: 5123.41 | bwd_inner_microstep: 5104.05 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-29 17:04:34,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.50 | bwd_microstep: 5158.12 | bwd_inner_microstep: 5083.16 | bwd_allreduce_microstep: 74.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-29 17:04:42,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3256.46 | bwd_microstep: 4841.19 | bwd_inner_microstep: 4803.04 | bwd_allreduce_microstep: 38.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-29 17:04:51,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.28 | bwd_microstep: 5146.80 | bwd_inner_microstep: 5092.06 | bwd_allreduce_microstep: 54.68 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-29 17:05:00,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.55 | bwd_microstep: 4985.33 | bwd_inner_microstep: 4966.03 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-29 17:05:08,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.45 | bwd_microstep: 5053.71 | bwd_inner_microstep: 4998.95 | bwd_allreduce_microstep: 54.70 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-29 17:05:18,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 17:05:18,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.40 | bwd_microstep: 5048.40 | bwd_inner_microstep: 5006.91 | bwd_allreduce_microstep: 41.42 | step_microstep: 472.25 [2024-07-29 17:05:18,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29198.38 | bwd: 40681.56 | bwd_inner: 40320.66 | bwd_allreduce: 360.43 | step: 472.84 41%|████ | 275/671 [5:22:04<7:42:15, 70.04s/it] {'loss': 1.1974, 'learning_rate': 1.336463191719367e-05, 'epoch': 0.41} 41%|████ | 275/671 [5:22:04<7:42:15, 70.04s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3988 [2024-07-29 17:05:27,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.31 | bwd_microstep: 5388.68 | bwd_inner_microstep: 5339.15 | bwd_allreduce_microstep: 49.46 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3575 [2024-07-29 17:05:36,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3673.00 | bwd_microstep: 5308.59 | bwd_inner_microstep: 5167.83 | bwd_allreduce_microstep: 140.70 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3802 [2024-07-29 17:05:45,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3778.84 | bwd_microstep: 5091.98 | bwd_inner_microstep: 5066.21 | bwd_allreduce_microstep: 25.71 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-29 17:05:53,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.22 | bwd_microstep: 5123.73 | bwd_inner_microstep: 5074.79 | bwd_allreduce_microstep: 48.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-29 17:06:02,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3049.10 | bwd_microstep: 5016.16 | bwd_inner_microstep: 4626.69 | bwd_allreduce_microstep: 389.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-29 17:06:10,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.30 | bwd_microstep: 4980.33 | bwd_inner_microstep: 4960.98 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2143 [2024-07-29 17:06:19,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.92 | bwd_microstep: 5117.47 | bwd_inner_microstep: 4721.07 | bwd_allreduce_microstep: 396.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-29 17:06:28,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 17:06:28,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.07 | bwd_microstep: 5055.79 | bwd_inner_microstep: 4663.42 | bwd_allreduce_microstep: 392.31 | step_microstep: 180.81 [2024-07-29 17:06:28,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28572.66 | bwd: 41082.72 | bwd_inner: 39620.07 | bwd_allreduce: 1462.18 | step: 181.51 41%|████ | 276/671 [5:23:14<7:40:58, 70.02s/it] {'loss': 1.1798, 'learning_rate': 1.3319078531285286e-05, 'epoch': 0.41} 41%|████ | 276/671 [5:23:14<7:40:58, 70.02s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3918 [2024-07-29 17:06:37,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.34 | bwd_microstep: 5309.28 | bwd_inner_microstep: 5255.03 | bwd_allreduce_microstep: 54.19 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3799 [2024-07-29 17:06:45,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3136.00 | bwd_microstep: 5012.41 | bwd_inner_microstep: 4969.05 | bwd_allreduce_microstep: 43.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-29 17:06:54,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.80 | bwd_microstep: 5345.82 | bwd_inner_microstep: 4932.38 | bwd_allreduce_microstep: 413.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-29 17:07:02,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.44 | bwd_microstep: 4886.12 | bwd_inner_microstep: 4866.71 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-29 17:07:11,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.01 | bwd_microstep: 5047.80 | bwd_inner_microstep: 4983.10 | bwd_allreduce_microstep: 64.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-29 17:07:20,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.08 | bwd_microstep: 5048.40 | bwd_inner_microstep: 4984.13 | bwd_allreduce_microstep: 64.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-29 17:07:28,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.08 | bwd_microstep: 5056.84 | bwd_inner_microstep: 4992.58 | bwd_allreduce_microstep: 64.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3667 [2024-07-29 17:07:37,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-29 17:07:37,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.89 | bwd_microstep: 4869.74 | bwd_inner_microstep: 4849.38 | bwd_allreduce_microstep: 20.29 | step_microstep: 181.19 [2024-07-29 17:07:37,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28484.55 | bwd: 40576.39 | bwd_inner: 39832.29 | bwd_allreduce: 743.62 | step: 181.77 41%|████▏ | 277/671 [5:24:23<7:38:33, 69.83s/it] {'loss': 1.1379, 'learning_rate': 1.3273447611818768e-05, 'epoch': 0.41} 41%|████▏ | 277/671 [5:24:23<7:38:33, 69.83s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3854 [2024-07-29 17:07:46,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.90 | bwd_microstep: 5107.40 | bwd_inner_microstep: 5088.33 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3576 [2024-07-29 17:07:55,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.82 | bwd_microstep: 5159.19 | bwd_inner_microstep: 5059.42 | bwd_allreduce_microstep: 99.71 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-29 17:08:04,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.83 | bwd_microstep: 5163.72 | bwd_inner_microstep: 5084.48 | bwd_allreduce_microstep: 79.16 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3749 [2024-07-29 17:08:12,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.50 | bwd_microstep: 5053.19 | bwd_inner_microstep: 4995.94 | bwd_allreduce_microstep: 57.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-29 17:08:21,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.46 | bwd_microstep: 5171.51 | bwd_inner_microstep: 4770.34 | bwd_allreduce_microstep: 401.10 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3708 [2024-07-29 17:08:29,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.18 | bwd_microstep: 5038.07 | bwd_inner_microstep: 4971.68 | bwd_allreduce_microstep: 66.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-29 17:08:38,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.08 | bwd_microstep: 4978.01 | bwd_inner_microstep: 4923.80 | bwd_allreduce_microstep: 54.14 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3675 [2024-07-29 17:08:47,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 17:08:47,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.24 | bwd_microstep: 4986.68 | bwd_inner_microstep: 4926.09 | bwd_allreduce_microstep: 60.53 | step_microstep: 181.29 [2024-07-29 17:08:47,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28691.90 | bwd: 40657.74 | bwd_inner: 39820.03 | bwd_allreduce: 837.26 | step: 181.84 41%|████▏ | 278/671 [5:25:33<7:37:05, 69.79s/it] {'loss': 1.1749, 'learning_rate': 1.3227740224730799e-05, 'epoch': 0.41} 41%|████▏ | 278/671 [5:25:33<7:37:05, 69.79s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2401 [2024-07-29 17:08:56,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.60 | bwd_microstep: 5544.18 | bwd_inner_microstep: 5133.75 | bwd_allreduce_microstep: 410.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-29 17:09:05,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.75 | bwd_microstep: 5001.23 | bwd_inner_microstep: 4981.97 | bwd_allreduce_microstep: 19.20 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3640 [2024-07-29 17:09:13,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.23 | bwd_microstep: 5139.27 | bwd_inner_microstep: 5049.50 | bwd_allreduce_microstep: 89.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-29 17:09:22,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.75 | bwd_microstep: 5019.75 | bwd_inner_microstep: 4995.41 | bwd_allreduce_microstep: 24.27 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2103 [2024-07-29 17:09:31,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.16 | bwd_microstep: 5169.47 | bwd_inner_microstep: 4768.99 | bwd_allreduce_microstep: 400.42 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2098 [2024-07-29 17:09:40,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.47 | bwd_microstep: 5179.68 | bwd_inner_microstep: 4779.54 | bwd_allreduce_microstep: 400.08 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-29 17:09:48,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.93 | bwd_microstep: 5158.13 | bwd_inner_microstep: 4758.05 | bwd_allreduce_microstep: 400.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-29 17:09:56,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 17:09:56,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3205.17 | bwd_microstep: 4764.21 | bwd_inner_microstep: 4732.82 | bwd_allreduce_microstep: 31.33 | step_microstep: 180.93 [2024-07-29 17:09:56,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28349.97 | bwd: 40975.91 | bwd_inner: 39199.96 | bwd_allreduce: 1775.48 | step: 181.49 42%|████▏ | 279/671 [5:26:42<7:35:40, 69.75s/it] {'loss': 1.1885, 'learning_rate': 1.3181957437744334e-05, 'epoch': 0.42} 42%|████▏ | 279/671 [5:26:42<7:35:40, 69.75s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3953 [2024-07-29 17:10:05,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3911.64 | bwd_microstep: 5168.71 | bwd_inner_microstep: 5149.53 | bwd_allreduce_microstep: 19.11 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-29 17:10:13,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3171.53 | bwd_microstep: 4678.60 | bwd_inner_microstep: 4654.56 | bwd_allreduce_microstep: 23.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3799 [2024-07-29 17:10:22,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3266.53 | bwd_microstep: 4978.11 | bwd_inner_microstep: 4958.59 | bwd_allreduce_microstep: 19.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3727 [2024-07-29 17:10:30,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.62 | bwd_microstep: 5225.40 | bwd_inner_microstep: 5168.57 | bwd_allreduce_microstep: 56.76 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2209 [2024-07-29 17:10:39,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3450.70 | bwd_microstep: 5015.64 | bwd_inner_microstep: 4626.59 | bwd_allreduce_microstep: 388.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 17:10:48,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.96 | bwd_microstep: 5097.10 | bwd_inner_microstep: 4699.78 | bwd_allreduce_microstep: 397.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 17:10:55,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3002.51 | bwd_microstep: 4890.44 | bwd_inner_microstep: 4512.31 | bwd_allreduce_microstep: 378.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-29 17:11:05,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 17:11:05,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.19 | bwd_microstep: 5048.29 | bwd_inner_microstep: 4992.14 | bwd_allreduce_microstep: 56.08 | step_microstep: 538.19 [2024-07-29 17:11:05,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27478.59 | bwd: 40102.29 | bwd_inner: 38762.03 | bwd_allreduce: 1339.79 | step: 538.77 42%|████▏ | 280/671 [5:27:51<7:31:36, 69.30s/it] {'loss': 1.1213, 'learning_rate': 1.3136100320343674e-05, 'epoch': 0.42} 42%|████▏ | 280/671 [5:27:51<7:31:36, 69.30s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3547 [2024-07-29 17:11:13,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.10 | bwd_microstep: 5148.39 | bwd_inner_microstep: 5063.70 | bwd_allreduce_microstep: 84.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3800 [2024-07-29 17:11:22,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3795.75 | bwd_microstep: 5058.72 | bwd_inner_microstep: 5034.38 | bwd_allreduce_microstep: 24.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-29 17:11:31,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.02 | bwd_microstep: 5139.45 | bwd_inner_microstep: 5060.97 | bwd_allreduce_microstep: 78.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-29 17:11:40,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.43 | bwd_microstep: 5162.94 | bwd_inner_microstep: 5087.21 | bwd_allreduce_microstep: 75.66 | step_microstep: 0.07 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2121 [2024-07-29 17:11:48,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.19 | bwd_microstep: 5070.79 | bwd_inner_microstep: 4675.67 | bwd_allreduce_microstep: 395.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-29 17:11:57,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.73 | bwd_microstep: 5145.14 | bwd_inner_microstep: 4743.38 | bwd_allreduce_microstep: 401.70 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2140 [2024-07-29 17:12:06,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.30 | bwd_microstep: 5116.43 | bwd_inner_microstep: 4720.02 | bwd_allreduce_microstep: 396.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-29 17:12:15,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 17:12:15,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.79 | bwd_microstep: 5045.46 | bwd_inner_microstep: 4986.89 | bwd_allreduce_microstep: 58.50 | step_microstep: 182.69 [2024-07-29 17:12:15,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28681.21 | bwd: 40887.31 | bwd_inner: 39372.17 | bwd_allreduce: 1514.67 | step: 183.36 42%|████▏ | 281/671 [5:29:00<7:31:37, 69.48s/it] {'loss': 1.1782, 'learning_rate': 1.3090169943749475e-05, 'epoch': 0.42} 42%|████▏ | 281/671 [5:29:00<7:31:37, 69.48s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3916 [2024-07-29 17:12:24,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3791.85 | bwd_microstep: 5278.04 | bwd_inner_microstep: 5259.01 | bwd_allreduce_microstep: 18.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3807 [2024-07-29 17:12:33,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.88 | bwd_microstep: 5318.21 | bwd_inner_microstep: 5250.35 | bwd_allreduce_microstep: 67.78 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2262 [2024-07-29 17:12:41,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.52 | bwd_microstep: 5204.92 | bwd_inner_microstep: 4799.78 | bwd_allreduce_microstep: 405.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-29 17:12:49,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3164.79 | bwd_microstep: 4726.36 | bwd_inner_microstep: 4691.65 | bwd_allreduce_microstep: 34.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3756 [2024-07-29 17:12:58,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.10 | bwd_microstep: 5131.77 | bwd_inner_microstep: 5056.17 | bwd_allreduce_microstep: 75.53 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-29 17:13:07,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.37 | bwd_microstep: 5155.08 | bwd_inner_microstep: 5101.92 | bwd_allreduce_microstep: 53.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-29 17:13:15,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.13 | bwd_microstep: 5010.22 | bwd_inner_microstep: 4959.84 | bwd_allreduce_microstep: 50.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-29 17:13:24,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.73 [2024-07-29 17:13:24,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3186.86 | bwd_microstep: 4722.51 | bwd_inner_microstep: 4698.70 | bwd_allreduce_microstep: 23.74 | step_microstep: 181.48 [2024-07-29 17:13:24,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28133.40 | bwd: 40547.09 | bwd_inner: 39817.38 | bwd_allreduce: 729.24 | step: 182.06 42%|████▏ | 282/671 [5:30:09<7:29:32, 69.34s/it] {'loss': 1.2075, 'learning_rate': 1.3044167380893726e-05, 'epoch': 0.42} 42%|████▏ | 282/671 [5:30:09<7:29:32, 69.34s/it]dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 833 [2024-07-29 17:13:33,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.47 | bwd_microstep: 5459.28 | bwd_inner_microstep: 5037.82 | bwd_allreduce_microstep: 421.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2290 [2024-07-29 17:13:42,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.95 | bwd_microstep: 5334.72 | bwd_inner_microstep: 4918.42 | bwd_allreduce_microstep: 416.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2232 [2024-07-29 17:13:50,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.38 | bwd_microstep: 5250.18 | bwd_inner_microstep: 4843.71 | bwd_allreduce_microstep: 406.37 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3776 [2024-07-29 17:13:59,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.81 | bwd_microstep: 5119.34 | bwd_inner_microstep: 5056.40 | bwd_allreduce_microstep: 62.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-29 17:14:07,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3048.13 | bwd_microstep: 4991.85 | bwd_inner_microstep: 4606.39 | bwd_allreduce_microstep: 385.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-29 17:14:16,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.00 | bwd_microstep: 5122.89 | bwd_inner_microstep: 5069.40 | bwd_allreduce_microstep: 53.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-29 17:14:25,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.31 | bwd_microstep: 5093.75 | bwd_inner_microstep: 4699.07 | bwd_allreduce_microstep: 394.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-29 17:14:34,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 17:14:34,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.81 | bwd_microstep: 5236.08 | bwd_inner_microstep: 5081.25 | bwd_allreduce_microstep: 154.76 | step_microstep: 182.42 [2024-07-29 17:14:34,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28073.75 | bwd: 41608.06 | bwd_inner: 39312.37 | bwd_allreduce: 2295.17 | step: 182.98 42%|████▏ | 283/671 [5:31:20<7:29:40, 69.54s/it] {'loss': 1.1648, 'learning_rate': 1.2998093706394674e-05, 'epoch': 0.42} 42%|████▏ | 283/671 [5:31:20<7:29:40, 69.54s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2424 [2024-07-29 17:14:43,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.51 | bwd_microstep: 5327.15 | bwd_inner_microstep: 4918.30 | bwd_allreduce_microstep: 408.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2247 [2024-07-29 17:14:51,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.16 | bwd_microstep: 5218.32 | bwd_inner_microstep: 4812.88 | bwd_allreduce_microstep: 405.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-29 17:15:00,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.43 | bwd_microstep: 5165.71 | bwd_inner_microstep: 5111.13 | bwd_allreduce_microstep: 54.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-29 17:15:09,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.29 | bwd_microstep: 4981.52 | bwd_inner_microstep: 4949.18 | bwd_allreduce_microstep: 32.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 17:15:17,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.64 | bwd_microstep: 5176.69 | bwd_inner_microstep: 5102.57 | bwd_allreduce_microstep: 74.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-29 17:15:26,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.18 | bwd_microstep: 5111.30 | bwd_inner_microstep: 4714.47 | bwd_allreduce_microstep: 396.77 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3702 [2024-07-29 17:15:35,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.60 | bwd_microstep: 5115.76 | bwd_inner_microstep: 5043.77 | bwd_allreduce_microstep: 71.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 17:15:44,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-29 17:15:44,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.44 | bwd_microstep: 5056.14 | bwd_inner_microstep: 4997.48 | bwd_allreduce_microstep: 58.60 | step_microstep: 182.73 [2024-07-29 17:15:44,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28609.15 | bwd: 41152.57 | bwd_inner: 39649.70 | bwd_allreduce: 1502.40 | step: 183.29 42%|████▏ | 284/671 [5:32:30<7:29:35, 69.70s/it] {'loss': 1.1872, 'learning_rate': 1.295194999653175e-05, 'epoch': 0.42} 42%|████▏ | 284/671 [5:32:30<7:29:35, 69.70s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3569 [2024-07-29 17:15:53,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.81 | bwd_microstep: 5291.54 | bwd_inner_microstep: 5123.27 | bwd_allreduce_microstep: 168.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3839 [2024-07-29 17:16:01,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.84 | bwd_microstep: 5196.43 | bwd_inner_microstep: 5145.17 | bwd_allreduce_microstep: 51.19 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3796 [2024-07-29 17:16:10,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.64 | bwd_microstep: 5061.06 | bwd_inner_microstep: 5035.25 | bwd_allreduce_microstep: 25.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-29 17:16:19,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.32 | bwd_microstep: 5030.50 | bwd_inner_microstep: 5007.33 | bwd_allreduce_microstep: 23.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3785 [2024-07-29 17:16:27,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3115.68 | bwd_microstep: 4946.76 | bwd_inner_microstep: 4904.95 | bwd_allreduce_microstep: 41.75 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3775 [2024-07-29 17:16:36,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.35 | bwd_microstep: 5102.46 | bwd_inner_microstep: 5070.03 | bwd_allreduce_microstep: 32.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-29 17:16:45,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.17 | bwd_microstep: 5218.67 | bwd_inner_microstep: 4814.00 | bwd_allreduce_microstep: 404.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-29 17:16:54,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 17:16:54,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.26 | bwd_microstep: 5098.41 | bwd_inner_microstep: 4701.59 | bwd_allreduce_microstep: 396.75 | step_microstep: 181.27 [2024-07-29 17:16:54,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28680.97 | bwd: 40945.81 | bwd_inner: 39801.53 | bwd_allreduce: 1143.80 | step: 181.84 42%|████▏ | 285/671 [5:33:40<7:28:54, 69.78s/it] {'loss': 1.2484, 'learning_rate': 1.2905737329220394e-05, 'epoch': 0.42} 42%|████▏ | 285/671 [5:33:40<7:28:54, 69.78s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3915 [2024-07-29 17:17:03,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3802.54 | bwd_microstep: 5278.04 | bwd_inner_microstep: 5258.87 | bwd_allreduce_microstep: 19.11 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3804 [2024-07-29 17:17:12,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.40 | bwd_microstep: 5238.10 | bwd_inner_microstep: 5153.10 | bwd_allreduce_microstep: 84.94 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2286 [2024-07-29 17:17:20,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.96 | bwd_microstep: 5238.88 | bwd_inner_microstep: 4832.01 | bwd_allreduce_microstep: 406.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-29 17:17:29,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.63 | bwd_microstep: 5158.76 | bwd_inner_microstep: 5080.48 | bwd_allreduce_microstep: 78.21 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-29 17:17:38,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.99 | bwd_microstep: 5015.86 | bwd_inner_microstep: 4995.51 | bwd_allreduce_microstep: 20.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-29 17:17:47,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.39 | bwd_microstep: 5237.69 | bwd_inner_microstep: 4832.95 | bwd_allreduce_microstep: 404.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-29 17:17:55,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3208.32 | bwd_microstep: 5044.05 | bwd_inner_microstep: 5016.59 | bwd_allreduce_microstep: 27.40 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-29 17:18:04,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 17:18:04,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.81 | bwd_microstep: 5224.46 | bwd_inner_microstep: 5164.38 | bwd_allreduce_microstep: 60.02 | step_microstep: 184.85 [2024-07-29 17:18:04,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28792.93 | bwd: 41435.83 | bwd_inner: 40333.82 | bwd_allreduce: 1101.53 | step: 185.43 43%|████▎ | 286/671 [5:34:50<7:29:15, 70.01s/it] {'loss': 1.1544, 'learning_rate': 1.2859456783986892e-05, 'epoch': 0.43} 43%|████▎ | 286/671 [5:34:50<7:29:15, 70.01s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-29 17:18:13,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.06 | bwd_microstep: 5261.95 | bwd_inner_microstep: 5173.19 | bwd_allreduce_microstep: 88.69 | step_microstep: 0.18 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3819 [2024-07-29 17:18:22,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.29 | bwd_microstep: 5153.32 | bwd_inner_microstep: 5098.12 | bwd_allreduce_microstep: 55.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3626 [2024-07-29 17:18:31,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.80 | bwd_microstep: 5396.83 | bwd_inner_microstep: 5292.89 | bwd_allreduce_microstep: 103.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-29 17:18:39,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3061.31 | bwd_microstep: 5290.88 | bwd_inner_microstep: 4900.88 | bwd_allreduce_microstep: 389.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3761 [2024-07-29 17:18:48,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.11 | bwd_microstep: 5169.22 | bwd_inner_microstep: 5116.45 | bwd_allreduce_microstep: 52.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-29 17:18:57,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.88 | bwd_microstep: 4993.73 | bwd_inner_microstep: 4974.28 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3665 [2024-07-29 17:19:06,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.40 | bwd_microstep: 5156.79 | bwd_inner_microstep: 5062.00 | bwd_allreduce_microstep: 94.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-07-29 17:19:15,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.72 [2024-07-29 17:19:15,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.76 | bwd_microstep: 5225.11 | bwd_inner_microstep: 5205.71 | bwd_allreduce_microstep: 19.33 | step_microstep: 466.39 [2024-07-29 17:19:15,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28657.49 | bwd: 41647.82 | bwd_inner: 40823.46 | bwd_allreduce: 823.88 | step: 467.07 43%|████▎ | 287/671 [5:36:01<7:29:49, 70.28s/it] {'loss': 1.2163, 'learning_rate': 1.2813109441943166e-05, 'epoch': 0.43} 43%|████▎ | 287/671 [5:36:01<7:29:49, 70.28s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3918 [2024-07-29 17:19:24,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.10 | bwd_microstep: 5097.03 | bwd_inner_microstep: 5062.20 | bwd_allreduce_microstep: 34.76 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2043 [2024-07-29 17:19:33,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.44 | bwd_microstep: 5336.60 | bwd_inner_microstep: 4924.54 | bwd_allreduce_microstep: 411.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-07-29 17:19:42,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.44 | bwd_microstep: 5384.65 | bwd_inner_microstep: 5319.12 | bwd_allreduce_microstep: 65.46 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2226 [2024-07-29 17:19:51,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.53 | bwd_microstep: 5227.30 | bwd_inner_microstep: 4822.42 | bwd_allreduce_microstep: 404.82 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2223 [2024-07-29 17:19:59,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.18 | bwd_microstep: 5222.36 | bwd_inner_microstep: 4817.53 | bwd_allreduce_microstep: 404.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-29 17:20:08,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.74 | bwd_microstep: 5072.46 | bwd_inner_microstep: 5042.86 | bwd_allreduce_microstep: 29.54 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2166 [2024-07-29 17:20:17,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.48 | bwd_microstep: 5077.43 | bwd_inner_microstep: 4681.46 | bwd_allreduce_microstep: 395.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-29 17:20:26,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-29 17:20:26,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.91 | bwd_microstep: 5151.03 | bwd_inner_microstep: 4750.37 | bwd_allreduce_microstep: 400.59 | step_microstep: 181.00 [2024-07-29 17:20:26,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28769.72 | bwd: 41568.85 | bwd_inner: 39420.44 | bwd_allreduce: 2147.93 | step: 181.57 43%|████▎ | 288/671 [5:37:12<7:29:22, 70.40s/it] {'loss': 1.1837, 'learning_rate': 1.2766696385761494e-05, 'epoch': 0.43} 43%|████▎ | 288/671 [5:37:12<7:29:22, 70.40s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3544 [2024-07-29 17:20:35,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3673.52 | bwd_microstep: 5351.05 | bwd_inner_microstep: 5201.42 | bwd_allreduce_microstep: 149.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3789 [2024-07-29 17:20:43,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.25 | bwd_microstep: 5113.87 | bwd_inner_microstep: 5071.40 | bwd_allreduce_microstep: 42.41 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2224 [2024-07-29 17:20:52,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.03 | bwd_microstep: 5225.74 | bwd_inner_microstep: 4819.48 | bwd_allreduce_microstep: 406.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-29 17:21:01,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.54 | bwd_microstep: 5113.86 | bwd_inner_microstep: 5043.63 | bwd_allreduce_microstep: 70.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-29 17:21:10,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.86 | bwd_microstep: 5049.08 | bwd_inner_microstep: 5007.23 | bwd_allreduce_microstep: 41.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-29 17:21:18,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.06 | bwd_microstep: 4899.28 | bwd_inner_microstep: 4880.02 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-29 17:21:27,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.41 | bwd_microstep: 5087.27 | bwd_inner_microstep: 5043.29 | bwd_allreduce_microstep: 43.91 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-29 17:21:36,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.73 [2024-07-29 17:21:36,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.70 | bwd_microstep: 4982.09 | bwd_inner_microstep: 4954.30 | bwd_allreduce_microstep: 27.72 | step_microstep: 181.47 [2024-07-29 17:21:36,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29161.29 | bwd: 40822.21 | bwd_inner: 40020.71 | bwd_allreduce: 801.04 | step: 182.04 43%|████▎ | 289/671 [5:38:22<7:28:02, 70.37s/it] {'loss': 1.153, 'learning_rate': 1.2720218699649243e-05, 'epoch': 0.43} 43%|████▎ | 289/671 [5:38:22<7:28:02, 70.37s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3831 [2024-07-29 17:21:45,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3805.88 | bwd_microstep: 5173.11 | bwd_inner_microstep: 5142.34 | bwd_allreduce_microstep: 30.70 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3589 [2024-07-29 17:21:54,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.23 | bwd_microstep: 5117.25 | bwd_inner_microstep: 5037.13 | bwd_allreduce_microstep: 80.05 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3784 [2024-07-29 17:22:02,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.60 | bwd_microstep: 5083.53 | bwd_inner_microstep: 5043.00 | bwd_allreduce_microstep: 40.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3765 [2024-07-29 17:22:11,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.29 | bwd_microstep: 5024.27 | bwd_inner_microstep: 5003.89 | bwd_allreduce_microstep: 20.31 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3729 [2024-07-29 17:22:20,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.44 | bwd_microstep: 5141.05 | bwd_inner_microstep: 5064.81 | bwd_allreduce_microstep: 76.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-29 17:22:28,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3186.27 | bwd_microstep: 4677.90 | bwd_inner_microstep: 4654.34 | bwd_allreduce_microstep: 23.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-29 17:22:37,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.72 | bwd_microstep: 5162.74 | bwd_inner_microstep: 5086.51 | bwd_allreduce_microstep: 76.15 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-29 17:22:45,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 17:22:45,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3387.27 | bwd_microstep: 4944.97 | bwd_inner_microstep: 4904.39 | bwd_allreduce_microstep: 40.51 | step_microstep: 181.44 [2024-07-29 17:22:45,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28516.61 | bwd: 40324.79 | bwd_inner: 39936.36 | bwd_allreduce: 387.96 | step: 182.12 43%|████▎ | 290/671 [5:39:31<7:24:35, 70.01s/it] {'loss': 1.2074, 'learning_rate': 1.2673677469323535e-05, 'epoch': 0.43} 43%|████▎ | 290/671 [5:39:31<7:24:35, 70.01s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-29 17:22:54,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.81 | bwd_microstep: 5351.55 | bwd_inner_microstep: 5278.71 | bwd_allreduce_microstep: 72.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2290 [2024-07-29 17:23:03,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.30 | bwd_microstep: 5283.09 | bwd_inner_microstep: 4873.94 | bwd_allreduce_microstep: 409.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2311 [2024-07-29 17:23:12,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.80 | bwd_microstep: 5160.96 | bwd_inner_microstep: 4759.41 | bwd_allreduce_microstep: 401.49 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3750 [2024-07-29 17:23:20,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3135.53 | bwd_microstep: 4720.29 | bwd_inner_microstep: 4700.95 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3646 [2024-07-29 17:23:28,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.42 | bwd_microstep: 5169.09 | bwd_inner_microstep: 5075.39 | bwd_allreduce_microstep: 93.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-29 17:23:37,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.16 | bwd_microstep: 5137.96 | bwd_inner_microstep: 5084.62 | bwd_allreduce_microstep: 53.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-29 17:23:46,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.59 | bwd_microstep: 5081.54 | bwd_inner_microstep: 4688.23 | bwd_allreduce_microstep: 393.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-29 17:23:55,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 17:23:55,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.56 | bwd_microstep: 5115.35 | bwd_inner_microstep: 4719.29 | bwd_allreduce_microstep: 395.99 | step_microstep: 182.30 [2024-07-29 17:23:55,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28060.09 | bwd: 41019.81 | bwd_inner: 39180.48 | bwd_allreduce: 1838.85 | step: 182.89 43%|████▎ | 291/671 [5:40:41<7:22:17, 69.84s/it] {'loss': 1.1652, 'learning_rate': 1.2627073781985873e-05, 'epoch': 0.43} 43%|████▎ | 291/671 [5:40:41<7:22:17, 69.84s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2385 [2024-07-29 17:24:03,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.45 | bwd_microstep: 5296.86 | bwd_inner_microstep: 4896.18 | bwd_allreduce_microstep: 400.61 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3042 [2024-07-29 17:24:12,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.36 | bwd_microstep: 5112.79 | bwd_inner_microstep: 4827.96 | bwd_allreduce_microstep: 284.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-29 17:24:21,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.05 | bwd_microstep: 5084.57 | bwd_inner_microstep: 5055.82 | bwd_allreduce_microstep: 28.69 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3135 [2024-07-29 17:24:30,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.03 | bwd_microstep: 5198.97 | bwd_inner_microstep: 4916.09 | bwd_allreduce_microstep: 282.81 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-07-29 17:24:39,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.81 | bwd_microstep: 5019.54 | bwd_inner_microstep: 4995.53 | bwd_allreduce_microstep: 23.93 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-29 17:24:47,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.27 | bwd_microstep: 5020.66 | bwd_inner_microstep: 4997.25 | bwd_allreduce_microstep: 23.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-29 17:24:56,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.45 | bwd_microstep: 5013.84 | bwd_inner_microstep: 4961.08 | bwd_allreduce_microstep: 52.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 17:25:04,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 17:25:04,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3193.58 | bwd_microstep: 4716.01 | bwd_inner_microstep: 4691.88 | bwd_allreduce_microstep: 24.07 | step_microstep: 180.95 [2024-07-29 17:25:04,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28666.91 | bwd: 40463.22 | bwd_inner: 39341.73 | bwd_allreduce: 1121.01 | step: 181.52 44%|████▎ | 292/671 [5:41:50<7:20:24, 69.72s/it] {'loss': 1.2019, 'learning_rate': 1.258040872629676e-05, 'epoch': 0.43} 44%|████▎ | 292/671 [5:41:50<7:20:24, 69.72s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2380 [2024-07-29 17:25:13,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.06 | bwd_microstep: 5605.95 | bwd_inner_microstep: 5189.02 | bwd_allreduce_microstep: 416.86 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3802 [2024-07-29 17:25:22,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.41 | bwd_microstep: 5183.01 | bwd_inner_microstep: 5112.91 | bwd_allreduce_microstep: 70.03 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3077 [2024-07-29 17:25:31,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.23 | bwd_microstep: 5166.56 | bwd_inner_microstep: 4883.52 | bwd_allreduce_microstep: 282.97 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3248 [2024-07-29 17:25:39,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3085.02 | bwd_microstep: 4906.27 | bwd_inner_microstep: 4816.93 | bwd_allreduce_microstep: 89.28 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3632 [2024-07-29 17:25:48,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.01 | bwd_microstep: 5197.62 | bwd_inner_microstep: 5105.59 | bwd_allreduce_microstep: 91.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2207 [2024-07-29 17:25:56,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.03 | bwd_microstep: 5092.13 | bwd_inner_microstep: 4697.70 | bwd_allreduce_microstep: 394.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-29 17:26:05,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3246.20 | bwd_microstep: 4817.79 | bwd_inner_microstep: 4780.76 | bwd_allreduce_microstep: 36.95 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2176 [2024-07-29 17:26:13,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 17:26:13,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.48 | bwd_microstep: 5081.48 | bwd_inner_microstep: 4687.55 | bwd_allreduce_microstep: 393.86 | step_microstep: 181.30 [2024-07-29 17:26:13,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27824.34 | bwd: 41050.78 | bwd_inner: 39273.92 | bwd_allreduce: 1776.39 | step: 181.85 44%|████▎ | 293/671 [5:42:59<7:18:15, 69.56s/it] {'loss': 1.1428, 'learning_rate': 1.2533683392350264e-05, 'epoch': 0.44} 44%|████▎ | 293/671 [5:42:59<7:18:15, 69.56s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2492 [2024-07-29 17:26:22,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.37 | bwd_microstep: 5432.36 | bwd_inner_microstep: 5016.08 | bwd_allreduce_microstep: 416.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-29 17:26:31,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.74 | bwd_microstep: 5156.89 | bwd_inner_microstep: 4754.57 | bwd_allreduce_microstep: 402.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3801 [2024-07-29 17:26:40,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.42 | bwd_microstep: 5020.96 | bwd_inner_microstep: 5001.62 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-29 17:26:49,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.53 | bwd_microstep: 5219.54 | bwd_inner_microstep: 4815.59 | bwd_allreduce_microstep: 403.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-29 17:26:57,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.81 | bwd_microstep: 5090.85 | bwd_inner_microstep: 5048.09 | bwd_allreduce_microstep: 42.69 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3674 [2024-07-29 17:27:06,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.66 | bwd_microstep: 4805.72 | bwd_inner_microstep: 4786.26 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3652 [2024-07-29 17:27:14,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.24 | bwd_microstep: 5160.73 | bwd_inner_microstep: 5069.92 | bwd_allreduce_microstep: 90.74 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2134 [2024-07-29 17:27:23,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 17:27:23,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.11 | bwd_microstep: 5097.46 | bwd_inner_microstep: 4702.53 | bwd_allreduce_microstep: 394.86 | step_microstep: 182.36 [2024-07-29 17:27:23,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28658.79 | bwd: 40984.48 | bwd_inner: 39194.60 | bwd_allreduce: 1789.40 | step: 183.07 44%|████▍ | 294/671 [5:44:09<7:17:52, 69.69s/it] {'loss': 1.1073, 'learning_rate': 1.2486898871648547e-05, 'epoch': 0.44} 44%|████▍ | 294/671 [5:44:09<7:17:52, 69.69s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2004 [2024-07-29 17:27:32,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.46 | bwd_microstep: 5229.03 | bwd_inner_microstep: 4825.44 | bwd_allreduce_microstep: 403.52 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 791 [2024-07-29 17:27:41,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.58 | bwd_microstep: 5403.76 | bwd_inner_microstep: 4989.29 | bwd_allreduce_microstep: 414.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-29 17:27:50,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.37 | bwd_microstep: 5002.16 | bwd_inner_microstep: 4978.11 | bwd_allreduce_microstep: 23.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-29 17:27:58,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.02 | bwd_microstep: 5008.30 | bwd_inner_microstep: 4988.94 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2125 [2024-07-29 17:28:07,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.33 | bwd_microstep: 5168.25 | bwd_inner_microstep: 4767.66 | bwd_allreduce_microstep: 400.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3747 [2024-07-29 17:28:16,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.41 | bwd_microstep: 4992.69 | bwd_inner_microstep: 4973.34 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-29 17:28:25,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.08 | bwd_microstep: 5167.45 | bwd_inner_microstep: 5087.88 | bwd_allreduce_microstep: 79.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-29 17:28:33,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 17:28:33,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3487.67 | bwd_microstep: 5055.62 | bwd_inner_microstep: 4663.39 | bwd_allreduce_microstep: 392.16 | step_microstep: 182.76 [2024-07-29 17:28:33,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28874.83 | bwd: 41027.21 | bwd_inner: 39273.98 | bwd_allreduce: 1752.74 | step: 183.35 44%|████▍ | 295/671 [5:45:19<7:17:43, 69.85s/it] {'loss': 1.2043, 'learning_rate': 1.2440056257076374e-05, 'epoch': 0.44} 44%|████▍ | 295/671 [5:45:19<7:17:43, 69.85s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3929 [2024-07-29 17:28:42,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3796.31 | bwd_microstep: 5188.13 | bwd_inner_microstep: 5168.93 | bwd_allreduce_microstep: 19.13 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2016 [2024-07-29 17:28:51,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.79 | bwd_microstep: 5251.60 | bwd_inner_microstep: 4843.70 | bwd_allreduce_microstep: 407.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-29 17:29:00,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.41 | bwd_microstep: 5176.45 | bwd_inner_microstep: 5100.67 | bwd_allreduce_microstep: 75.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-29 17:29:09,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.43 | bwd_microstep: 5128.24 | bwd_inner_microstep: 5057.29 | bwd_allreduce_microstep: 70.88 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-07-29 17:29:18,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.21 | bwd_microstep: 5164.17 | bwd_inner_microstep: 5109.20 | bwd_allreduce_microstep: 54.91 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-07-29 17:29:26,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.53 | bwd_microstep: 5133.27 | bwd_inner_microstep: 4734.77 | bwd_allreduce_microstep: 398.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-29 17:29:35,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.50 | bwd_microstep: 5032.08 | bwd_inner_microstep: 4974.56 | bwd_allreduce_microstep: 57.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-29 17:29:43,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 17:29:43,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3195.99 | bwd_microstep: 4699.84 | bwd_inner_microstep: 4678.44 | bwd_allreduce_microstep: 21.34 | step_microstep: 182.67 [2024-07-29 17:29:43,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28399.07 | bwd: 40773.77 | bwd_inner: 39667.52 | bwd_allreduce: 1105.79 | step: 183.23 44%|████▍ | 296/671 [5:46:29<7:15:54, 69.75s/it] {'loss': 1.1542, 'learning_rate': 1.2393156642875579e-05, 'epoch': 0.44} 44%|████▍ | 296/671 [5:46:29<7:15:54, 69.75s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3825 [2024-07-29 17:29:52,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.35 | bwd_microstep: 5135.93 | bwd_inner_microstep: 5096.02 | bwd_allreduce_microstep: 39.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3789 [2024-07-29 17:30:00,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3111.10 | bwd_microstep: 4882.87 | bwd_inner_microstep: 4849.12 | bwd_allreduce_microstep: 33.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3594 [2024-07-29 17:30:09,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.99 | bwd_microstep: 5163.21 | bwd_inner_microstep: 5067.49 | bwd_allreduce_microstep: 95.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-29 17:30:17,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.50 | bwd_microstep: 5095.61 | bwd_inner_microstep: 5026.65 | bwd_allreduce_microstep: 68.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3639 [2024-07-29 17:30:26,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.24 | bwd_microstep: 4999.08 | bwd_inner_microstep: 4924.20 | bwd_allreduce_microstep: 74.81 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-29 17:30:34,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.17 | bwd_microstep: 4898.36 | bwd_inner_microstep: 4879.03 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3669 [2024-07-29 17:30:43,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.06 | bwd_microstep: 4883.64 | bwd_inner_microstep: 4864.31 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-29 17:30:52,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 17:30:52,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.18 | bwd_microstep: 5051.34 | bwd_inner_microstep: 4992.79 | bwd_allreduce_microstep: 58.49 | step_microstep: 181.46 [2024-07-29 17:30:52,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28441.50 | bwd: 40110.02 | bwd_inner: 39699.54 | bwd_allreduce: 410.01 | step: 182.03 44%|████▍ | 297/671 [5:47:38<7:13:07, 69.48s/it] {'loss': 1.1572, 'learning_rate': 1.2346201124619502e-05, 'epoch': 0.44} 44%|████▍ | 297/671 [5:47:38<7:13:07, 69.48s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3834 [2024-07-29 17:31:01,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.26 | bwd_microstep: 5313.81 | bwd_inner_microstep: 5247.56 | bwd_allreduce_microstep: 66.18 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3809 [2024-07-29 17:31:10,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3794.19 | bwd_microstep: 5101.09 | bwd_inner_microstep: 5074.29 | bwd_allreduce_microstep: 26.73 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-29 17:31:19,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.53 | bwd_microstep: 5240.20 | bwd_inner_microstep: 5155.01 | bwd_allreduce_microstep: 85.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-29 17:31:27,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3205.48 | bwd_microstep: 4826.95 | bwd_inner_microstep: 4786.57 | bwd_allreduce_microstep: 40.31 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2101 [2024-07-29 17:31:35,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.41 | bwd_microstep: 5077.48 | bwd_inner_microstep: 4682.95 | bwd_allreduce_microstep: 394.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-29 17:31:44,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.86 | bwd_microstep: 5014.29 | bwd_inner_microstep: 4991.39 | bwd_allreduce_microstep: 22.84 | step_microstep: 0.18 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3676 [2024-07-29 17:31:53,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.87 | bwd_microstep: 5085.32 | bwd_inner_microstep: 5003.32 | bwd_allreduce_microstep: 81.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 17:32:02,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 17:32:02,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.53 | bwd_microstep: 5074.20 | bwd_inner_microstep: 4681.80 | bwd_allreduce_microstep: 392.33 | step_microstep: 181.07 [2024-07-29 17:32:02,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28632.04 | bwd: 40733.32 | bwd_inner: 39622.82 | bwd_allreduce: 1110.02 | step: 181.75 44%|████▍ | 298/671 [5:48:48<7:12:21, 69.55s/it] {'loss': 1.1742, 'learning_rate': 1.2299190799187405e-05, 'epoch': 0.44} 44%|████▍ | 298/671 [5:48:48<7:12:21, 69.55s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3869 [2024-07-29 17:32:11,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3848.62 | bwd_microstep: 5328.08 | bwd_inner_microstep: 5283.87 | bwd_allreduce_microstep: 44.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2274 [2024-07-29 17:32:20,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.58 | bwd_microstep: 5348.10 | bwd_inner_microstep: 4933.68 | bwd_allreduce_microstep: 414.27 | step_microstep: 0.20 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-29 17:32:28,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3204.90 | bwd_microstep: 4849.72 | bwd_inner_microstep: 4799.40 | bwd_allreduce_microstep: 50.25 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-29 17:32:37,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.80 | bwd_microstep: 5191.24 | bwd_inner_microstep: 5134.63 | bwd_allreduce_microstep: 56.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-29 17:32:45,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.30 | bwd_microstep: 5036.49 | bwd_inner_microstep: 5012.44 | bwd_allreduce_microstep: 23.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2224 [2024-07-29 17:32:54,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3476.06 | bwd_microstep: 5062.82 | bwd_inner_microstep: 4669.38 | bwd_allreduce_microstep: 393.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-29 17:33:03,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.53 | bwd_microstep: 5156.39 | bwd_inner_microstep: 4755.56 | bwd_allreduce_microstep: 400.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-29 17:33:12,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 17:33:12,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.01 | bwd_microstep: 5080.41 | bwd_inner_microstep: 5017.39 | bwd_allreduce_microstep: 62.96 | step_microstep: 181.07 [2024-07-29 17:33:12,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28663.70 | bwd: 41053.25 | bwd_inner: 39606.28 | bwd_allreduce: 1446.45 | step: 181.76 45%|████▍ | 299/671 [5:49:58<7:12:07, 69.70s/it] {'loss': 1.2058, 'learning_rate': 1.2252126764738845e-05, 'epoch': 0.45} 45%|████▍ | 299/671 [5:49:58<7:12:07, 69.70s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3975 [2024-07-29 17:33:21,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3853.87 | bwd_microstep: 5242.44 | bwd_inner_microstep: 5223.35 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2373 [2024-07-29 17:33:30,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.55 | bwd_microstep: 5265.18 | bwd_inner_microstep: 4855.75 | bwd_allreduce_microstep: 409.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-29 17:33:38,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.71 | bwd_microstep: 5189.88 | bwd_inner_microstep: 5108.63 | bwd_allreduce_microstep: 81.18 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2087 [2024-07-29 17:33:47,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.19 | bwd_microstep: 5203.00 | bwd_inner_microstep: 4796.81 | bwd_allreduce_microstep: 406.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-29 17:33:56,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.00 | bwd_microstep: 5173.84 | bwd_inner_microstep: 5094.42 | bwd_allreduce_microstep: 79.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-29 17:34:05,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.61 | bwd_microstep: 5214.42 | bwd_inner_microstep: 4807.67 | bwd_allreduce_microstep: 406.69 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2119 [2024-07-29 17:34:14,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.46 | bwd_microstep: 5242.09 | bwd_inner_microstep: 4835.14 | bwd_allreduce_microstep: 406.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-29 17:34:22,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-07-29 17:34:22,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.47 | bwd_microstep: 5051.81 | bwd_inner_microstep: 4993.14 | bwd_allreduce_microstep: 58.60 | step_microstep: 181.04 [2024-07-29 17:34:22,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28874.77 | bwd: 41582.64 | bwd_inner: 39714.83 | bwd_allreduce: 1867.32 | step: 181.61 45%|████▍ | 300/671 [5:51:08<7:12:58, 70.02s/it] {'loss': 1.1983, 'learning_rate': 1.2205010120688012e-05, 'epoch': 0.45} 45%|████▍ | 300/671 [5:51:08<7:12:58, 70.02s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3866 [2024-07-29 17:34:31,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3797.37 | bwd_microstep: 5135.11 | bwd_inner_microstep: 5110.77 | bwd_allreduce_microstep: 24.28 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2036 [2024-07-29 17:34:40,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.57 | bwd_microstep: 5196.36 | bwd_inner_microstep: 4792.36 | bwd_allreduce_microstep: 403.94 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2281 [2024-07-29 17:34:49,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.37 | bwd_microstep: 5221.60 | bwd_inner_microstep: 4815.41 | bwd_allreduce_microstep: 406.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3766 [2024-07-29 17:34:58,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.25 | bwd_microstep: 5216.50 | bwd_inner_microstep: 5154.00 | bwd_allreduce_microstep: 62.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-29 17:35:06,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.93 | bwd_microstep: 5025.88 | bwd_inner_microstep: 4986.81 | bwd_allreduce_microstep: 39.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-29 17:35:15,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.32 | bwd_microstep: 5229.67 | bwd_inner_microstep: 4827.07 | bwd_allreduce_microstep: 402.53 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-29 17:35:24,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.91 | bwd_microstep: 4988.83 | bwd_inner_microstep: 4969.43 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-29 17:35:33,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 17:35:33,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.79 | bwd_microstep: 5062.27 | bwd_inner_microstep: 4668.87 | bwd_allreduce_microstep: 393.33 | step_microstep: 181.10 [2024-07-29 17:35:33,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28787.41 | bwd: 41076.22 | bwd_inner: 39324.67 | bwd_allreduce: 1751.08 | step: 181.69 45%|████▍ | 301/671 [5:52:19<7:12:06, 70.07s/it] {'loss': 1.171, 'learning_rate': 1.2157841967678064e-05, 'epoch': 0.45} 45%|████▍ | 301/671 [5:52:19<7:12:06, 70.07s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3547 [2024-07-29 17:35:42,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.69 | bwd_microstep: 5708.20 | bwd_inner_microstep: 5442.92 | bwd_allreduce_microstep: 265.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3568 [2024-07-29 17:35:51,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.57 | bwd_microstep: 5225.01 | bwd_inner_microstep: 5133.00 | bwd_allreduce_microstep: 91.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-29 17:36:00,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.48 | bwd_microstep: 5155.84 | bwd_inner_microstep: 5079.56 | bwd_allreduce_microstep: 76.21 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2101 [2024-07-29 17:36:08,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.58 | bwd_microstep: 5282.68 | bwd_inner_microstep: 4874.05 | bwd_allreduce_microstep: 408.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3761 [2024-07-29 17:36:17,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.50 | bwd_microstep: 5105.84 | bwd_inner_microstep: 5068.23 | bwd_allreduce_microstep: 37.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-29 17:36:26,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.30 | bwd_microstep: 5116.42 | bwd_inner_microstep: 4718.78 | bwd_allreduce_microstep: 397.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-29 17:36:34,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3236.46 | bwd_microstep: 4867.01 | bwd_inner_microstep: 4824.97 | bwd_allreduce_microstep: 41.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-29 17:36:43,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 17:36:43,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.20 | bwd_microstep: 5119.56 | bwd_inner_microstep: 4720.12 | bwd_allreduce_microstep: 399.38 | step_microstep: 181.80 [2024-07-29 17:36:43,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28328.70 | bwd: 41580.55 | bwd_inner: 39861.58 | bwd_allreduce: 1718.51 | step: 182.37 45%|████▌ | 302/671 [5:53:29<7:11:14, 70.12s/it] {'loss': 1.1433, 'learning_rate': 1.2110623407555398e-05, 'epoch': 0.45} 45%|████▌ | 302/671 [5:53:29<7:11:14, 70.12s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2347 [2024-07-29 17:36:52,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.82 | bwd_microstep: 5289.52 | bwd_inner_microstep: 4880.45 | bwd_allreduce_microstep: 409.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2242 [2024-07-29 17:37:00,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3060.13 | bwd_microstep: 5077.17 | bwd_inner_microstep: 4685.34 | bwd_allreduce_microstep: 391.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-29 17:37:09,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.26 | bwd_microstep: 5152.69 | bwd_inner_microstep: 4752.58 | bwd_allreduce_microstep: 400.05 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-29 17:37:17,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.19 | bwd_microstep: 4820.38 | bwd_inner_microstep: 4778.07 | bwd_allreduce_microstep: 42.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-29 17:37:25,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.03 | bwd_microstep: 5191.84 | bwd_inner_microstep: 5106.69 | bwd_allreduce_microstep: 85.08 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2176 [2024-07-29 17:37:34,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.89 | bwd_microstep: 5110.12 | bwd_inner_microstep: 4712.54 | bwd_allreduce_microstep: 397.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3693 [2024-07-29 17:37:43,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.47 | bwd_microstep: 4992.99 | bwd_inner_microstep: 4943.29 | bwd_allreduce_microstep: 49.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-29 17:37:52,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 17:37:52,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.95 | bwd_microstep: 5176.97 | bwd_inner_microstep: 5100.55 | bwd_allreduce_microstep: 76.35 | step_microstep: 180.87 [2024-07-29 17:37:52,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27629.64 | bwd: 40811.67 | bwd_inner: 38959.46 | bwd_allreduce: 1851.74 | step: 181.54 45%|████▌ | 303/671 [5:54:38<7:07:35, 69.72s/it] {'loss': 1.2067, 'learning_rate': 1.2063355543343925e-05, 'epoch': 0.45} 45%|████▌ | 303/671 [5:54:38<7:07:35, 69.72s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2092 [2024-07-29 17:38:01,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.98 | bwd_microstep: 5400.00 | bwd_inner_microstep: 4983.78 | bwd_allreduce_microstep: 416.15 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3577 [2024-07-29 17:38:09,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.87 | bwd_microstep: 5118.83 | bwd_inner_microstep: 5037.47 | bwd_allreduce_microstep: 81.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-29 17:38:18,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.94 | bwd_microstep: 5036.13 | bwd_inner_microstep: 5010.01 | bwd_allreduce_microstep: 26.06 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2078 [2024-07-29 17:38:27,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.94 | bwd_microstep: 5258.28 | bwd_inner_microstep: 4850.55 | bwd_allreduce_microstep: 407.67 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-29 17:38:36,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.60 | bwd_microstep: 4984.04 | bwd_inner_microstep: 4964.63 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-29 17:38:44,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3311.62 | bwd_microstep: 5001.53 | bwd_inner_microstep: 4617.98 | bwd_allreduce_microstep: 383.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-29 17:38:53,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.23 | bwd_microstep: 4998.11 | bwd_inner_microstep: 4944.32 | bwd_allreduce_microstep: 53.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-29 17:39:01,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-29 17:39:01,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.29 | bwd_microstep: 4878.61 | bwd_inner_microstep: 4859.05 | bwd_allreduce_microstep: 19.49 | step_microstep: 180.87 [2024-07-29 17:39:01,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28694.38 | bwd: 40675.52 | bwd_inner: 39267.73 | bwd_allreduce: 1407.31 | step: 181.44 45%|████▌ | 304/671 [5:55:47<7:06:23, 69.71s/it] {'loss': 1.18, 'learning_rate': 1.2016039479219293e-05, 'epoch': 0.45} 45%|████▌ | 304/671 [5:55:47<7:06:23, 69.71s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2351 [2024-07-29 17:39:10,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.37 | bwd_microstep: 5258.80 | bwd_inner_microstep: 4854.50 | bwd_allreduce_microstep: 404.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3863 [2024-07-29 17:39:19,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.03 | bwd_microstep: 5216.23 | bwd_inner_microstep: 5163.69 | bwd_allreduce_microstep: 52.48 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2050 [2024-07-29 17:39:27,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3045.64 | bwd_microstep: 5040.76 | bwd_inner_microstep: 4653.81 | bwd_allreduce_microstep: 386.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-29 17:39:36,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.32 | bwd_microstep: 5222.66 | bwd_inner_microstep: 4814.90 | bwd_allreduce_microstep: 407.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3760 [2024-07-29 17:39:44,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3209.01 | bwd_microstep: 4793.85 | bwd_inner_microstep: 4774.49 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3689 [2024-07-29 17:39:53,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.21 | bwd_microstep: 5050.07 | bwd_inner_microstep: 4977.19 | bwd_allreduce_microstep: 72.81 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3659 [2024-07-29 17:40:01,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.51 | bwd_microstep: 5097.54 | bwd_inner_microstep: 5013.53 | bwd_allreduce_microstep: 83.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-29 17:40:10,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 17:40:10,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.14 | bwd_microstep: 4994.78 | bwd_inner_microstep: 4975.44 | bwd_allreduce_microstep: 19.28 | step_microstep: 181.24 [2024-07-29 17:40:10,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27848.14 | bwd: 40674.68 | bwd_inner: 39227.49 | bwd_allreduce: 1446.71 | step: 181.83 45%|████▌ | 305/671 [5:56:56<7:03:39, 69.45s/it] {'loss': 1.145, 'learning_rate': 1.1968676320483103e-05, 'epoch': 0.45} 45%|████▌ | 305/671 [5:56:56<7:03:39, 69.45s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-29 17:40:19,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.66 | bwd_microstep: 5209.22 | bwd_inner_microstep: 5126.46 | bwd_allreduce_microstep: 82.70 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3840 [2024-07-29 17:40:28,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.17 | bwd_microstep: 5251.63 | bwd_inner_microstep: 5196.38 | bwd_allreduce_microstep: 55.18 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2075 [2024-07-29 17:40:36,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3074.12 | bwd_microstep: 5046.29 | bwd_inner_microstep: 4658.50 | bwd_allreduce_microstep: 387.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3780 [2024-07-29 17:40:45,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.70 | bwd_microstep: 5014.25 | bwd_inner_microstep: 4994.85 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3757 [2024-07-29 17:40:54,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.20 | bwd_microstep: 5017.90 | bwd_inner_microstep: 4996.17 | bwd_allreduce_microstep: 21.66 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-29 17:41:02,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.81 | bwd_microstep: 5055.22 | bwd_inner_microstep: 4992.05 | bwd_allreduce_microstep: 63.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-29 17:41:10,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3188.24 | bwd_microstep: 4705.48 | bwd_inner_microstep: 4686.04 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-29 17:41:19,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 17:41:19,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.06 | bwd_microstep: 4956.31 | bwd_inner_microstep: 4928.95 | bwd_allreduce_microstep: 27.29 | step_microstep: 181.49 [2024-07-29 17:41:19,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28232.86 | bwd: 40256.29 | bwd_inner: 39579.34 | bwd_allreduce: 676.46 | step: 182.09 46%|████▌ | 306/671 [5:58:05<7:01:20, 69.26s/it] {'loss': 1.157, 'learning_rate': 1.1921267173537083e-05, 'epoch': 0.46} 46%|████▌ | 306/671 [5:58:05<7:01:20, 69.26s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3844 [2024-07-29 17:41:28,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.06 | bwd_microstep: 5262.25 | bwd_inner_microstep: 5206.70 | bwd_allreduce_microstep: 55.48 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2029 [2024-07-29 17:41:37,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.39 | bwd_microstep: 5206.70 | bwd_inner_microstep: 4804.84 | bwd_allreduce_microstep: 401.80 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2268 [2024-07-29 17:41:45,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.15 | bwd_microstep: 5284.50 | bwd_inner_microstep: 4873.22 | bwd_allreduce_microstep: 411.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-29 17:41:53,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3183.44 | bwd_microstep: 4697.96 | bwd_inner_microstep: 4670.15 | bwd_allreduce_microstep: 27.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-29 17:42:02,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.20 | bwd_microstep: 5010.74 | bwd_inner_microstep: 4990.05 | bwd_allreduce_microstep: 20.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-29 17:42:10,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3236.51 | bwd_microstep: 4786.06 | bwd_inner_microstep: 4766.80 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-29 17:42:19,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.07 | bwd_microstep: 4989.70 | bwd_inner_microstep: 4970.26 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-29 17:42:28,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.42 [2024-07-29 17:42:28,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.33 | bwd_microstep: 5127.93 | bwd_inner_microstep: 4731.72 | bwd_allreduce_microstep: 396.13 | step_microstep: 181.03 [2024-07-29 17:42:28,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28160.05 | bwd: 40365.81 | bwd_inner: 39013.69 | bwd_allreduce: 1351.64 | step: 181.71 46%|████▌ | 307/671 [5:59:14<6:59:26, 69.14s/it] {'loss': 1.2327, 'learning_rate': 1.187381314585725e-05, 'epoch': 0.46} 46%|████▌ | 307/671 [5:59:14<6:59:26, 69.14s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3524 [2024-07-29 17:42:37,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.13 | bwd_microstep: 5150.47 | bwd_inner_microstep: 5064.96 | bwd_allreduce_microstep: 85.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2281 [2024-07-29 17:42:45,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.03 | bwd_microstep: 5071.05 | bwd_inner_microstep: 4676.75 | bwd_allreduce_microstep: 394.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3747 [2024-07-29 17:42:54,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.89 | bwd_microstep: 5041.40 | bwd_inner_microstep: 5015.11 | bwd_allreduce_microstep: 26.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3632 [2024-07-29 17:43:03,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.60 | bwd_microstep: 5129.92 | bwd_inner_microstep: 5038.65 | bwd_allreduce_microstep: 91.20 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2109 [2024-07-29 17:43:11,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.12 | bwd_microstep: 5064.50 | bwd_inner_microstep: 4670.66 | bwd_allreduce_microstep: 393.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-29 17:43:20,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.53 | bwd_microstep: 4989.99 | bwd_inner_microstep: 4939.52 | bwd_allreduce_microstep: 50.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-29 17:43:28,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.80 | bwd_microstep: 5100.70 | bwd_inner_microstep: 5032.36 | bwd_allreduce_microstep: 68.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-29 17:43:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-29 17:43:37,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.53 | bwd_microstep: 4897.81 | bwd_inner_microstep: 4878.42 | bwd_allreduce_microstep: 19.31 | step_microstep: 181.93 [2024-07-29 17:43:37,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28719.53 | bwd: 40445.81 | bwd_inner: 39316.38 | bwd_allreduce: 1128.96 | step: 182.50 46%|████▌ | 308/671 [6:00:23<6:58:56, 69.25s/it] {'loss': 1.2182, 'learning_rate': 1.1826315345968014e-05, 'epoch': 0.46} 46%|████▌ | 308/671 [6:00:23<6:58:56, 69.25s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3582 [2024-07-29 17:43:46,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.83 | bwd_microstep: 5222.23 | bwd_inner_microstep: 5132.06 | bwd_allreduce_microstep: 90.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3810 [2024-07-29 17:43:55,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.15 | bwd_microstep: 5223.86 | bwd_inner_microstep: 5168.55 | bwd_allreduce_microstep: 55.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3850 [2024-07-29 17:44:04,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.32 | bwd_microstep: 5047.10 | bwd_inner_microstep: 5016.10 | bwd_allreduce_microstep: 30.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-29 17:44:12,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.48 | bwd_microstep: 4823.25 | bwd_inner_microstep: 4803.34 | bwd_allreduce_microstep: 19.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2209 [2024-07-29 17:44:21,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.21 | bwd_microstep: 5241.19 | bwd_inner_microstep: 4834.86 | bwd_allreduce_microstep: 406.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-29 17:44:29,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.37 | bwd_microstep: 4977.44 | bwd_inner_microstep: 4944.11 | bwd_allreduce_microstep: 33.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-29 17:44:37,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.52 | bwd_microstep: 4732.69 | bwd_inner_microstep: 4708.50 | bwd_allreduce_microstep: 24.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-29 17:44:46,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 17:44:46,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.14 | bwd_microstep: 5190.87 | bwd_inner_microstep: 5113.22 | bwd_allreduce_microstep: 77.59 | step_microstep: 181.27 [2024-07-29 17:44:46,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28198.92 | bwd: 40458.61 | bwd_inner: 39720.67 | bwd_allreduce: 737.47 | step: 181.86 46%|████▌ | 309/671 [6:01:32<6:57:18, 69.17s/it] {'loss': 1.2291, 'learning_rate': 1.1778774883416325e-05, 'epoch': 0.46} 46%|████▌ | 309/671 [6:01:32<6:57:18, 69.17s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3587 [2024-07-29 17:44:55,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.33 | bwd_microstep: 5244.21 | bwd_inner_microstep: 5155.94 | bwd_allreduce_microstep: 88.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3849 [2024-07-29 17:45:04,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.65 | bwd_microstep: 5104.95 | bwd_inner_microstep: 5085.68 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-29 17:45:13,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.82 | bwd_microstep: 5209.20 | bwd_inner_microstep: 5129.04 | bwd_allreduce_microstep: 80.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2222 [2024-07-29 17:45:22,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.82 | bwd_microstep: 5252.10 | bwd_inner_microstep: 4843.36 | bwd_allreduce_microstep: 408.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-29 17:45:31,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.56 | bwd_microstep: 5164.46 | bwd_inner_microstep: 5111.36 | bwd_allreduce_microstep: 53.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3736 [2024-07-29 17:45:39,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.90 | bwd_microstep: 4794.89 | bwd_inner_microstep: 4775.51 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1142 [2024-07-29 17:45:47,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2984.85 | bwd_microstep: 4998.26 | bwd_inner_microstep: 4617.79 | bwd_allreduce_microstep: 380.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-29 17:45:56,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 17:45:56,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.82 | bwd_microstep: 5464.18 | bwd_inner_microstep: 4932.53 | bwd_allreduce_microstep: 531.59 | step_microstep: 181.18 [2024-07-29 17:45:56,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27952.67 | bwd: 41232.23 | bwd_inner: 39651.17 | bwd_allreduce: 1580.60 | step: 181.76 46%|████▌ | 310/671 [6:02:42<6:56:46, 69.27s/it] {'loss': 1.2624, 'learning_rate': 1.1731192868745717e-05, 'epoch': 0.46} 46%|████▌ | 310/671 [6:02:42<6:56:46, 69.27s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3553 [2024-07-29 17:46:04,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3196.12 | bwd_microstep: 4789.80 | bwd_inner_microstep: 4741.51 | bwd_allreduce_microstep: 48.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-29 17:46:12,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3191.08 | bwd_microstep: 4811.67 | bwd_inner_microstep: 4774.63 | bwd_allreduce_microstep: 36.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-29 17:46:21,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.19 | bwd_microstep: 5197.69 | bwd_inner_microstep: 5116.29 | bwd_allreduce_microstep: 81.34 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3767 [2024-07-29 17:46:30,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.17 | bwd_microstep: 5048.28 | bwd_inner_microstep: 5023.21 | bwd_allreduce_microstep: 25.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-29 17:46:38,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.73 | bwd_microstep: 5170.52 | bwd_inner_microstep: 5085.93 | bwd_allreduce_microstep: 84.52 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-29 17:46:47,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.52 | bwd_microstep: 5161.81 | bwd_inner_microstep: 4759.64 | bwd_allreduce_microstep: 402.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-29 17:46:56,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.38 | bwd_microstep: 5076.20 | bwd_inner_microstep: 5015.24 | bwd_allreduce_microstep: 60.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-29 17:47:04,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.43 [2024-07-29 17:47:04,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.83 | bwd_microstep: 4961.85 | bwd_inner_microstep: 4932.54 | bwd_allreduce_microstep: 29.25 | step_microstep: 181.52 [2024-07-29 17:47:04,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28145.94 | bwd: 40217.79 | bwd_inner: 39448.92 | bwd_allreduce: 768.40 | step: 182.19 46%|████▋ | 311/671 [6:03:50<6:54:34, 69.10s/it] {'loss': 1.1245, 'learning_rate': 1.1683570413470386e-05, 'epoch': 0.46} 46%|████▋ | 311/671 [6:03:50<6:54:34, 69.10s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3873 [2024-07-29 17:47:13,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.68 | bwd_microstep: 5036.65 | bwd_inner_microstep: 5013.09 | bwd_allreduce_microstep: 23.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-29 17:47:22,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.82 | bwd_microstep: 5241.65 | bwd_inner_microstep: 5154.24 | bwd_allreduce_microstep: 87.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3815 [2024-07-29 17:47:31,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.52 | bwd_microstep: 5053.66 | bwd_inner_microstep: 5034.28 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2111 [2024-07-29 17:47:40,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.06 | bwd_microstep: 5217.42 | bwd_inner_microstep: 4812.92 | bwd_allreduce_microstep: 404.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2230 [2024-07-29 17:47:48,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.01 | bwd_microstep: 5231.15 | bwd_inner_microstep: 4827.37 | bwd_allreduce_microstep: 403.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-29 17:47:57,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.03 | bwd_microstep: 5060.51 | bwd_inner_microstep: 5019.38 | bwd_allreduce_microstep: 41.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-29 17:48:06,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.32 | bwd_microstep: 5153.96 | bwd_inner_microstep: 4754.49 | bwd_allreduce_microstep: 399.40 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3708 [2024-07-29 17:48:15,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 17:48:15,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.46 | bwd_microstep: 5034.09 | bwd_inner_microstep: 4959.57 | bwd_allreduce_microstep: 74.46 | step_microstep: 181.95 [2024-07-29 17:48:15,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28776.81 | bwd: 41029.08 | bwd_inner: 39575.26 | bwd_allreduce: 1453.34 | step: 182.52 46%|████▋ | 312/671 [6:05:01<6:55:17, 69.41s/it] {'loss': 1.2198, 'learning_rate': 1.163590863004922e-05, 'epoch': 0.46} 46%|████▋ | 312/671 [6:05:01<6:55:17, 69.41s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3949 [2024-07-29 17:48:23,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3322.38 | bwd_microstep: 5009.43 | bwd_inner_microstep: 4987.00 | bwd_allreduce_microstep: 22.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3579 [2024-07-29 17:48:32,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.59 | bwd_microstep: 5190.70 | bwd_inner_microstep: 5099.98 | bwd_allreduce_microstep: 90.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3757 [2024-07-29 17:48:41,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.82 | bwd_microstep: 4999.38 | bwd_inner_microstep: 4979.88 | bwd_allreduce_microstep: 19.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-29 17:48:49,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.05 | bwd_microstep: 5047.24 | bwd_inner_microstep: 5007.39 | bwd_allreduce_microstep: 39.78 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2087 [2024-07-29 17:48:58,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3463.87 | bwd_microstep: 5052.90 | bwd_inner_microstep: 4660.01 | bwd_allreduce_microstep: 392.81 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-29 17:49:06,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.93 | bwd_microstep: 4719.21 | bwd_inner_microstep: 4695.18 | bwd_allreduce_microstep: 23.96 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2134 [2024-07-29 17:49:15,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.54 | bwd_microstep: 5162.46 | bwd_inner_microstep: 4758.85 | bwd_allreduce_microstep: 403.54 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-29 17:49:23,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-29 17:49:23,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3707.16 | bwd_microstep: 4891.26 | bwd_inner_microstep: 4871.78 | bwd_allreduce_microstep: 19.41 | step_microstep: 181.54 [2024-07-29 17:49:23,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28301.23 | bwd: 40072.57 | bwd_inner: 39060.02 | bwd_allreduce: 1012.07 | step: 182.14 47%|████▋ | 313/671 [6:06:09<6:52:52, 69.20s/it] {'loss': 1.121, 'learning_rate': 1.1588208631859808e-05, 'epoch': 0.47} 47%|████▋ | 313/671 [6:06:09<6:52:52, 69.20s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-29 17:49:31,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3260.45 | bwd_microstep: 4839.91 | bwd_inner_microstep: 4798.15 | bwd_allreduce_microstep: 41.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3857 [2024-07-29 17:49:40,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.47 | bwd_microstep: 5205.06 | bwd_inner_microstep: 5151.00 | bwd_allreduce_microstep: 54.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3766 [2024-07-29 17:49:49,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.44 | bwd_microstep: 5012.00 | bwd_inner_microstep: 4992.47 | bwd_allreduce_microstep: 19.45 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-29 17:49:58,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.46 | bwd_microstep: 5155.90 | bwd_inner_microstep: 5080.34 | bwd_allreduce_microstep: 75.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-07-29 17:50:07,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.46 | bwd_microstep: 5156.34 | bwd_inner_microstep: 5102.92 | bwd_allreduce_microstep: 53.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 17:50:15,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.60 | bwd_microstep: 5178.51 | bwd_inner_microstep: 4775.64 | bwd_allreduce_microstep: 402.80 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-29 17:50:23,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.93 | bwd_microstep: 4714.70 | bwd_inner_microstep: 4690.53 | bwd_allreduce_microstep: 24.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-29 17:50:32,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-29 17:50:32,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.76 | bwd_microstep: 5029.53 | bwd_inner_microstep: 4973.50 | bwd_allreduce_microstep: 55.96 | step_microstep: 182.50 [2024-07-29 17:50:32,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28204.47 | bwd: 40291.93 | bwd_inner: 39564.50 | bwd_allreduce: 726.96 | step: 183.09 47%|████▋ | 314/671 [6:07:18<6:51:03, 69.09s/it] {'loss': 1.1509, 'learning_rate': 1.154047153317243e-05, 'epoch': 0.47} 47%|████▋ | 314/671 [6:07:18<6:51:03, 69.09s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3934 [2024-07-29 17:50:41,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.58 | bwd_microstep: 5235.86 | bwd_inner_microstep: 5190.19 | bwd_allreduce_microstep: 45.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-29 17:50:50,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.80 | bwd_microstep: 5181.57 | bwd_inner_microstep: 5103.86 | bwd_allreduce_microstep: 77.65 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3597 [2024-07-29 17:50:58,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3055.84 | bwd_microstep: 4797.64 | bwd_inner_microstep: 4752.94 | bwd_allreduce_microstep: 44.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-29 17:51:07,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.27 | bwd_microstep: 5026.87 | bwd_inner_microstep: 5002.33 | bwd_allreduce_microstep: 24.48 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2238 [2024-07-29 17:51:15,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3046.01 | bwd_microstep: 4996.72 | bwd_inner_microstep: 4610.11 | bwd_allreduce_microstep: 386.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-29 17:51:23,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.33 | bwd_microstep: 4995.09 | bwd_inner_microstep: 4939.38 | bwd_allreduce_microstep: 55.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-29 17:51:32,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.17 | bwd_microstep: 5158.78 | bwd_inner_microstep: 5078.88 | bwd_allreduce_microstep: 79.83 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-29 17:51:41,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 17:51:41,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.18 | bwd_microstep: 5108.56 | bwd_inner_microstep: 5040.95 | bwd_allreduce_microstep: 67.55 | step_microstep: 180.70 [2024-07-29 17:51:41,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27856.07 | bwd: 40501.08 | bwd_inner: 39718.57 | bwd_allreduce: 782.03 | step: 181.39 47%|████▋ | 315/671 [6:08:27<6:49:12, 68.97s/it] {'loss': 1.0908, 'learning_rate': 1.1492698449124042e-05, 'epoch': 0.47} 47%|████▋ | 315/671 [6:08:27<6:49:12, 68.97s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3843 [2024-07-29 17:51:50,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.48 | bwd_microstep: 5229.36 | bwd_inner_microstep: 5179.07 | bwd_allreduce_microstep: 50.20 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2259 [2024-07-29 17:51:58,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3026.25 | bwd_microstep: 4984.98 | bwd_inner_microstep: 4597.93 | bwd_allreduce_microstep: 386.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3833 [2024-07-29 17:52:07,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.79 | bwd_microstep: 5048.79 | bwd_inner_microstep: 5029.43 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-29 17:52:15,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.12 | bwd_microstep: 5154.34 | bwd_inner_microstep: 5077.05 | bwd_allreduce_microstep: 77.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-29 17:52:24,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.47 | bwd_microstep: 5028.19 | bwd_inner_microstep: 5006.95 | bwd_allreduce_microstep: 21.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-29 17:52:33,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.54 | bwd_microstep: 5139.36 | bwd_inner_microstep: 4739.46 | bwd_allreduce_microstep: 399.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-29 17:52:42,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.55 | bwd_microstep: 5000.45 | bwd_inner_microstep: 4981.10 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-29 17:52:50,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 17:52:50,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.19 | bwd_microstep: 4823.79 | bwd_inner_microstep: 4786.09 | bwd_allreduce_microstep: 37.63 | step_microstep: 181.78 [2024-07-29 17:52:50,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28255.29 | bwd: 40409.24 | bwd_inner: 39397.02 | bwd_allreduce: 1011.73 | step: 182.34 47%|████▋ | 316/671 [6:09:36<6:48:06, 68.98s/it] {'loss': 1.2005, 'learning_rate': 1.1444890495692214e-05, 'epoch': 0.47} 47%|████▋ | 316/671 [6:09:36<6:48:06, 68.98s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3600 [2024-07-29 17:52:59,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.36 | bwd_microstep: 5117.53 | bwd_inner_microstep: 5032.35 | bwd_allreduce_microstep: 85.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3777 [2024-07-29 17:53:07,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.10 | bwd_microstep: 5197.46 | bwd_inner_microstep: 5143.08 | bwd_allreduce_microstep: 54.31 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3771 [2024-07-29 17:53:16,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.60 | bwd_microstep: 5217.28 | bwd_inner_microstep: 5142.19 | bwd_allreduce_microstep: 75.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3763 [2024-07-29 17:53:24,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3216.60 | bwd_microstep: 4806.03 | bwd_inner_microstep: 4786.63 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-29 17:53:32,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3164.77 | bwd_microstep: 4674.61 | bwd_inner_microstep: 4651.40 | bwd_allreduce_microstep: 23.14 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-29 17:53:41,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.98 | bwd_microstep: 4901.60 | bwd_inner_microstep: 4882.16 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-29 17:53:49,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.11 | bwd_microstep: 5060.34 | bwd_inner_microstep: 4996.07 | bwd_allreduce_microstep: 64.20 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-29 17:53:57,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 17:53:57,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3179.07 | bwd_microstep: 4693.83 | bwd_inner_microstep: 4674.41 | bwd_allreduce_microstep: 19.34 | step_microstep: 181.33 [2024-07-29 17:53:57,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27622.50 | bwd: 39668.66 | bwd_inner: 39308.24 | bwd_allreduce: 359.94 | step: 181.91 47%|████▋ | 317/671 [6:10:43<6:44:33, 68.57s/it] {'loss': 1.1822, 'learning_rate': 1.1397048789669061e-05, 'epoch': 0.47} 47%|████▋ | 317/671 [6:10:43<6:44:33, 68.57s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3969 [2024-07-29 17:54:07,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3843.30 | bwd_microstep: 5243.57 | bwd_inner_microstep: 5224.43 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3998 [2024-07-29 17:54:16,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3848.07 | bwd_microstep: 5287.99 | bwd_inner_microstep: 5268.63 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3796 [2024-07-29 17:54:25,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.86 | bwd_microstep: 5024.92 | bwd_inner_microstep: 5005.43 | bwd_allreduce_microstep: 19.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-29 17:54:33,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.97 | bwd_microstep: 4857.61 | bwd_inner_microstep: 4810.71 | bwd_allreduce_microstep: 46.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3787 [2024-07-29 17:54:41,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.20 | bwd_microstep: 5036.79 | bwd_inner_microstep: 5017.45 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-29 17:54:50,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.52 | bwd_microstep: 5181.27 | bwd_inner_microstep: 5107.74 | bwd_allreduce_microstep: 73.46 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-29 17:54:59,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.66 | bwd_microstep: 5146.14 | bwd_inner_microstep: 4747.21 | bwd_allreduce_microstep: 398.87 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3680 [2024-07-29 17:55:08,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 17:55:08,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.20 | bwd_microstep: 4974.97 | bwd_inner_microstep: 4910.74 | bwd_allreduce_microstep: 64.16 | step_microstep: 180.89 [2024-07-29 17:55:08,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29127.67 | bwd: 40753.24 | bwd_inner: 40092.29 | bwd_allreduce: 660.48 | step: 181.49 47%|████▋ | 318/671 [6:11:54<6:46:19, 69.06s/it] {'loss': 1.1689, 'learning_rate': 1.1349174448635158e-05, 'epoch': 0.47} 47%|████▋ | 318/671 [6:11:54<6:46:19, 69.06s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2390 [2024-07-29 17:55:17,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.28 | bwd_microstep: 5504.43 | bwd_inner_microstep: 5098.92 | bwd_allreduce_microstep: 405.44 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-29 17:55:26,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.12 | bwd_microstep: 5020.57 | bwd_inner_microstep: 4996.58 | bwd_allreduce_microstep: 23.92 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2267 [2024-07-29 17:55:34,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.68 | bwd_microstep: 5192.14 | bwd_inner_microstep: 4787.94 | bwd_allreduce_microstep: 404.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2237 [2024-07-29 17:55:43,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.92 | bwd_microstep: 5113.01 | bwd_inner_microstep: 4717.49 | bwd_allreduce_microstep: 395.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-29 17:55:52,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.70 | bwd_microstep: 5044.06 | bwd_inner_microstep: 5002.11 | bwd_allreduce_microstep: 41.89 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 651 [2024-07-29 17:56:00,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3466.31 | bwd_microstep: 5165.69 | bwd_inner_microstep: 4767.42 | bwd_allreduce_microstep: 398.20 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3635 [2024-07-29 17:56:09,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.26 | bwd_microstep: 5052.62 | bwd_inner_microstep: 4970.78 | bwd_allreduce_microstep: 81.76 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-29 17:56:18,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 17:56:18,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.26 | bwd_microstep: 5074.79 | bwd_inner_microstep: 5011.51 | bwd_allreduce_microstep: 63.21 | step_microstep: 180.66 [2024-07-29 17:56:18,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28596.44 | bwd: 41167.30 | bwd_inner: 39352.69 | bwd_allreduce: 1814.11 | step: 181.38 48%|████▊ | 319/671 [6:13:04<6:46:59, 69.37s/it] {'loss': 1.1987, 'learning_rate': 1.1301268590933434e-05, 'epoch': 0.47} 48%|████▊ | 319/671 [6:13:04<6:46:59, 69.37s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2369 [2024-07-29 17:56:27,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.40 | bwd_microstep: 5218.63 | bwd_inner_microstep: 4816.95 | bwd_allreduce_microstep: 401.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3759 [2024-07-29 17:56:35,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.26 | bwd_microstep: 5105.71 | bwd_inner_microstep: 5058.42 | bwd_allreduce_microstep: 47.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-29 17:56:44,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.48 | bwd_microstep: 5162.93 | bwd_inner_microstep: 4761.15 | bwd_allreduce_microstep: 401.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-29 17:56:53,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.55 | bwd_microstep: 4974.71 | bwd_inner_microstep: 4955.42 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-29 17:57:01,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.64 | bwd_microstep: 5026.71 | bwd_inner_microstep: 5001.06 | bwd_allreduce_microstep: 25.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3727 [2024-07-29 17:57:10,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.84 | bwd_microstep: 5175.75 | bwd_inner_microstep: 5121.20 | bwd_allreduce_microstep: 54.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-29 17:57:19,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.03 | bwd_microstep: 4922.03 | bwd_inner_microstep: 4896.73 | bwd_allreduce_microstep: 25.24 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2146 [2024-07-29 17:57:28,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 17:57:28,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.31 | bwd_microstep: 5124.90 | bwd_inner_microstep: 4726.93 | bwd_allreduce_microstep: 397.90 | step_microstep: 180.95 [2024-07-29 17:57:28,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28990.41 | bwd: 40711.34 | bwd_inner: 39337.79 | bwd_allreduce: 1373.08 | step: 181.52 48%|████▊ | 320/671 [6:14:14<6:46:59, 69.57s/it] {'loss': 1.1795, 'learning_rate': 1.1253332335643043e-05, 'epoch': 0.48} 48%|████▊ | 320/671 [6:14:14<6:46:59, 69.57s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3918 [2024-07-29 17:57:37,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.89 | bwd_microstep: 5381.34 | bwd_inner_microstep: 5312.57 | bwd_allreduce_microstep: 68.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3797 [2024-07-29 17:57:46,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.72 | bwd_microstep: 5036.13 | bwd_inner_microstep: 5016.30 | bwd_allreduce_microstep: 19.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-29 17:57:55,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.87 | bwd_microstep: 5296.15 | bwd_inner_microstep: 4884.39 | bwd_allreduce_microstep: 411.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2275 [2024-07-29 17:58:03,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.93 | bwd_microstep: 5252.83 | bwd_inner_microstep: 4844.49 | bwd_allreduce_microstep: 408.27 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3632 [2024-07-29 17:58:12,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.08 | bwd_microstep: 5087.33 | bwd_inner_microstep: 4994.21 | bwd_allreduce_microstep: 93.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-29 17:58:21,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.63 | bwd_microstep: 4978.37 | bwd_inner_microstep: 4959.07 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 628 [2024-07-29 17:58:29,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2977.75 | bwd_microstep: 5031.15 | bwd_inner_microstep: 4649.35 | bwd_allreduce_microstep: 381.72 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3664 [2024-07-29 17:58:38,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 17:58:38,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.59 | bwd_microstep: 5185.32 | bwd_inner_microstep: 5101.74 | bwd_allreduce_microstep: 83.52 | step_microstep: 181.89 [2024-07-29 17:58:38,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28530.38 | bwd: 41248.60 | bwd_inner: 39762.06 | bwd_allreduce: 1486.06 | step: 182.45 48%|████▊ | 321/671 [6:15:24<6:46:46, 69.73s/it] {'loss': 1.1771, 'learning_rate': 1.1205366802553233e-05, 'epoch': 0.48} 48%|████▊ | 321/671 [6:15:24<6:46:46, 69.73s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3856 [2024-07-29 17:58:47,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3677.08 | bwd_microstep: 5317.07 | bwd_inner_microstep: 5252.06 | bwd_allreduce_microstep: 64.95 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2051 [2024-07-29 17:58:56,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.34 | bwd_microstep: 5216.76 | bwd_inner_microstep: 4811.73 | bwd_allreduce_microstep: 404.97 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3766 [2024-07-29 17:59:04,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.26 | bwd_microstep: 5008.67 | bwd_inner_microstep: 4989.23 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-29 17:59:13,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.29 | bwd_microstep: 4981.64 | bwd_inner_microstep: 4962.32 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-29 17:59:22,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.60 | bwd_microstep: 5220.15 | bwd_inner_microstep: 4816.27 | bwd_allreduce_microstep: 403.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-29 17:59:31,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.46 | bwd_microstep: 4984.34 | bwd_inner_microstep: 4965.02 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3707 [2024-07-29 17:59:39,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.77 | bwd_microstep: 5067.34 | bwd_inner_microstep: 5002.80 | bwd_allreduce_microstep: 64.48 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2136 [2024-07-29 17:59:48,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 17:59:48,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3464.79 | bwd_microstep: 5041.48 | bwd_inner_microstep: 4650.80 | bwd_allreduce_microstep: 390.62 | step_microstep: 180.83 [2024-07-29 17:59:48,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28986.50 | bwd: 40837.44 | bwd_inner: 39450.16 | bwd_allreduce: 1386.78 | step: 181.52 48%|████▊ | 322/671 [6:16:34<6:46:20, 69.86s/it] {'loss': 1.1479, 'learning_rate': 1.1157373112137171e-05, 'epoch': 0.48} 48%|████▊ | 322/671 [6:16:34<6:46:20, 69.86s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3547 [2024-07-29 17:59:57,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.68 | bwd_microstep: 5156.72 | bwd_inner_microstep: 5048.13 | bwd_allreduce_microstep: 108.53 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2070 [2024-07-29 18:00:05,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3082.89 | bwd_microstep: 5131.84 | bwd_inner_microstep: 4737.56 | bwd_allreduce_microstep: 394.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3787 [2024-07-29 18:00:14,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.57 | bwd_microstep: 5200.10 | bwd_inner_microstep: 5143.45 | bwd_allreduce_microstep: 56.59 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2085 [2024-07-29 18:00:23,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.83 | bwd_microstep: 5194.35 | bwd_inner_microstep: 4789.83 | bwd_allreduce_microstep: 404.46 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2205 [2024-07-29 18:00:31,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.60 | bwd_microstep: 5064.35 | bwd_inner_microstep: 4668.99 | bwd_allreduce_microstep: 395.29 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-29 18:00:40,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.65 | bwd_microstep: 5192.16 | bwd_inner_microstep: 4790.15 | bwd_allreduce_microstep: 401.94 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3663 [2024-07-29 18:00:49,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.56 | bwd_microstep: 5105.52 | bwd_inner_microstep: 5019.77 | bwd_allreduce_microstep: 85.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-29 18:00:57,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 18:00:57,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.03 | bwd_microstep: 5027.03 | bwd_inner_microstep: 4970.36 | bwd_allreduce_microstep: 56.60 | step_microstep: 181.10 [2024-07-29 18:00:57,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28025.71 | bwd: 41072.07 | bwd_inner: 39168.17 | bwd_allreduce: 1903.42 | step: 181.69 48%|████▊ | 323/671 [6:17:43<6:44:25, 69.73s/it] {'loss': 1.1888, 'learning_rate': 1.1109352385525782e-05, 'epoch': 0.48} 48%|████▊ | 323/671 [6:17:43<6:44:25, 69.73s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4037 [2024-07-29 18:01:07,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3856.38 | bwd_microstep: 5319.77 | bwd_inner_microstep: 5300.67 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3600 [2024-07-29 18:01:16,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.09 | bwd_microstep: 5203.56 | bwd_inner_microstep: 5102.77 | bwd_allreduce_microstep: 100.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3807 [2024-07-29 18:01:24,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.22 | bwd_microstep: 5041.14 | bwd_inner_microstep: 5021.81 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-29 18:01:33,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.45 | bwd_microstep: 5182.89 | bwd_inner_microstep: 4780.95 | bwd_allreduce_microstep: 401.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2219 [2024-07-29 18:01:42,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.08 | bwd_microstep: 5223.54 | bwd_inner_microstep: 4815.08 | bwd_allreduce_microstep: 408.40 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3672 [2024-07-29 18:01:51,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.09 | bwd_microstep: 5091.76 | bwd_inner_microstep: 5008.90 | bwd_allreduce_microstep: 82.80 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-29 18:01:59,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.40 | bwd_microstep: 4955.86 | bwd_inner_microstep: 4925.98 | bwd_allreduce_microstep: 29.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-29 18:02:08,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.78 [2024-07-29 18:02:08,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.76 | bwd_microstep: 5024.11 | bwd_inner_microstep: 4984.24 | bwd_allreduce_microstep: 39.80 | step_microstep: 180.96 [2024-07-29 18:02:08,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29376.36 | bwd: 41042.62 | bwd_inner: 39940.33 | bwd_allreduce: 1101.82 | step: 181.53 48%|████▊ | 324/671 [6:18:54<6:45:02, 70.04s/it] {'loss': 1.1953, 'learning_rate': 1.1061305744481561e-05, 'epoch': 0.48} 48%|████▊ | 324/671 [6:18:54<6:45:02, 70.04s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2283 [2024-07-29 18:02:16,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3101.36 | bwd_microstep: 5135.49 | bwd_inner_microstep: 4745.00 | bwd_allreduce_microstep: 390.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3866 [2024-07-29 18:02:25,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3781.86 | bwd_microstep: 5111.81 | bwd_inner_microstep: 5092.41 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2261 [2024-07-29 18:02:34,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.26 | bwd_microstep: 5225.39 | bwd_inner_microstep: 4817.85 | bwd_allreduce_microstep: 407.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-29 18:02:42,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3205.61 | bwd_microstep: 4776.08 | bwd_inner_microstep: 4741.05 | bwd_allreduce_microstep: 34.97 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2116 [2024-07-29 18:02:51,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.36 | bwd_microstep: 5179.06 | bwd_inner_microstep: 4777.07 | bwd_allreduce_microstep: 401.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-29 18:03:00,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.04 | bwd_microstep: 5184.48 | bwd_inner_microstep: 5110.14 | bwd_allreduce_microstep: 74.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3693 [2024-07-29 18:03:09,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.28 | bwd_microstep: 5176.68 | bwd_inner_microstep: 5082.14 | bwd_allreduce_microstep: 94.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-29 18:03:17,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 18:03:17,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.28 | bwd_microstep: 5055.71 | bwd_inner_microstep: 4999.31 | bwd_allreduce_microstep: 56.34 | step_microstep: 181.13 [2024-07-29 18:03:17,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27974.95 | bwd: 40844.68 | bwd_inner: 39364.89 | bwd_allreduce: 1479.31 | step: 181.71 48%|████▊ | 325/671 [6:20:03<6:42:19, 69.77s/it] {'loss': 1.2657, 'learning_rate': 1.1013234311372353e-05, 'epoch': 0.48} 48%|████▊ | 325/671 [6:20:03<6:42:19, 69.77s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2312 [2024-07-29 18:03:26,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.27 | bwd_microstep: 5420.59 | bwd_inner_microstep: 5003.86 | bwd_allreduce_microstep: 416.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3819 [2024-07-29 18:03:35,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.41 | bwd_microstep: 5043.42 | bwd_inner_microstep: 5024.08 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3747 [2024-07-29 18:03:44,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.91 | bwd_microstep: 5054.62 | bwd_inner_microstep: 5027.13 | bwd_allreduce_microstep: 27.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2257 [2024-07-29 18:03:52,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3062.96 | bwd_microstep: 4993.03 | bwd_inner_microstep: 4608.59 | bwd_allreduce_microstep: 384.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-29 18:04:00,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3187.76 | bwd_microstep: 4775.95 | bwd_inner_microstep: 4743.25 | bwd_allreduce_microstep: 32.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-29 18:04:09,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.31 | bwd_microstep: 5175.84 | bwd_inner_microstep: 4773.38 | bwd_allreduce_microstep: 402.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-29 18:04:18,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.26 | bwd_microstep: 5231.50 | bwd_inner_microstep: 4823.67 | bwd_allreduce_microstep: 407.77 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2143 [2024-07-29 18:04:27,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 18:04:27,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.55 | bwd_microstep: 5137.72 | bwd_inner_microstep: 4739.57 | bwd_allreduce_microstep: 398.08 | step_microstep: 181.53 [2024-07-29 18:04:27,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27981.32 | bwd: 40832.66 | bwd_inner: 38743.48 | bwd_allreduce: 2088.71 | step: 182.12 49%|████▊ | 326/671 [6:21:12<6:40:05, 69.58s/it] {'loss': 1.2082, 'learning_rate': 1.096513920914515e-05, 'epoch': 0.49} 49%|████▊ | 326/671 [6:21:12<6:40:05, 69.58s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3904 [2024-07-29 18:04:36,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3843.66 | bwd_microstep: 5216.17 | bwd_inner_microstep: 5188.52 | bwd_allreduce_microstep: 27.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3825 [2024-07-29 18:04:44,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.69 | bwd_microstep: 5173.20 | bwd_inner_microstep: 5124.55 | bwd_allreduce_microstep: 48.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3798 [2024-07-29 18:04:53,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3790.14 | bwd_microstep: 5042.76 | bwd_inner_microstep: 5021.02 | bwd_allreduce_microstep: 21.67 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3641 [2024-07-29 18:05:01,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3129.93 | bwd_microstep: 4998.32 | bwd_inner_microstep: 4928.03 | bwd_allreduce_microstep: 70.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3746 [2024-07-29 18:05:10,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.99 | bwd_microstep: 5015.13 | bwd_inner_microstep: 4958.40 | bwd_allreduce_microstep: 56.67 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-29 18:05:19,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.52 | bwd_microstep: 4992.38 | bwd_inner_microstep: 4973.02 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-29 18:05:28,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.98 | bwd_microstep: 5055.18 | bwd_inner_microstep: 5013.17 | bwd_allreduce_microstep: 41.94 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-29 18:05:36,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 18:05:36,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.25 | bwd_microstep: 5109.72 | bwd_inner_microstep: 4714.45 | bwd_allreduce_microstep: 395.20 | step_microstep: 180.66 [2024-07-29 18:05:36,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28929.06 | bwd: 40602.84 | bwd_inner: 39921.10 | bwd_allreduce: 681.28 | step: 181.34 49%|████▊ | 327/671 [6:22:22<6:39:25, 69.67s/it] {'loss': 1.212, 'learning_rate': 1.0917021561299864e-05, 'epoch': 0.49} 49%|████▊ | 327/671 [6:22:22<6:39:25, 69.67s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3861 [2024-07-29 18:05:46,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3841.18 | bwd_microstep: 5253.80 | bwd_inner_microstep: 5215.62 | bwd_allreduce_microstep: 38.11 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3566 [2024-07-29 18:05:54,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3149.25 | bwd_microstep: 5104.29 | bwd_inner_microstep: 5011.53 | bwd_allreduce_microstep: 92.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-29 18:06:03,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.99 | bwd_microstep: 4999.52 | bwd_inner_microstep: 4980.09 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2232 [2024-07-29 18:06:11,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.63 | bwd_microstep: 5197.84 | bwd_inner_microstep: 4791.92 | bwd_allreduce_microstep: 405.86 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-29 18:06:20,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.15 | bwd_microstep: 4999.12 | bwd_inner_microstep: 4979.76 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-29 18:06:29,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.96 | bwd_microstep: 4987.72 | bwd_inner_microstep: 4968.24 | bwd_allreduce_microstep: 19.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-29 18:06:37,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.14 | bwd_microstep: 5069.18 | bwd_inner_microstep: 5012.45 | bwd_allreduce_microstep: 56.66 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-29 18:06:46,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-29 18:06:46,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.86 | bwd_microstep: 4904.23 | bwd_inner_microstep: 4883.37 | bwd_allreduce_microstep: 20.79 | step_microstep: 180.85 [2024-07-29 18:06:46,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29044.07 | bwd: 40515.67 | bwd_inner: 39842.93 | bwd_allreduce: 672.26 | step: 181.44 49%|████▉ | 328/671 [6:23:32<6:38:38, 69.73s/it] {'loss': 1.1922, 'learning_rate': 1.0868882491863048e-05, 'epoch': 0.49} 49%|████▉ | 328/671 [6:23:32<6:38:38, 69.73s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3668 [2024-07-29 18:06:55,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.32 | bwd_microstep: 5189.50 | bwd_inner_microstep: 5099.73 | bwd_allreduce_microstep: 89.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3869 [2024-07-29 18:07:04,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.81 | bwd_microstep: 5136.35 | bwd_inner_microstep: 5098.75 | bwd_allreduce_microstep: 37.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-29 18:07:13,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.28 | bwd_microstep: 5154.29 | bwd_inner_microstep: 5078.48 | bwd_allreduce_microstep: 75.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-07-29 18:07:22,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.70 | bwd_microstep: 5197.37 | bwd_inner_microstep: 5115.84 | bwd_allreduce_microstep: 81.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-29 18:07:30,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.31 | bwd_microstep: 5204.88 | bwd_inner_microstep: 5146.21 | bwd_allreduce_microstep: 58.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-29 18:07:39,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.80 | bwd_microstep: 5157.50 | bwd_inner_microstep: 5102.00 | bwd_allreduce_microstep: 55.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-29 18:07:47,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3239.25 | bwd_microstep: 4732.52 | bwd_inner_microstep: 4706.19 | bwd_allreduce_microstep: 26.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-29 18:07:56,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 18:07:56,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.66 | bwd_microstep: 5063.15 | bwd_inner_microstep: 5006.02 | bwd_allreduce_microstep: 57.06 | step_microstep: 182.78 [2024-07-29 18:07:56,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28537.05 | bwd: 40835.55 | bwd_inner: 40353.15 | bwd_allreduce: 481.92 | step: 183.36 49%|████▉ | 329/671 [6:24:42<6:37:25, 69.73s/it] {'loss': 1.1676, 'learning_rate': 1.0820723125361685e-05, 'epoch': 0.49} 49%|████▉ | 329/671 [6:24:42<6:37:25, 69.73s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3921 [2024-07-29 18:08:05,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3845.89 | bwd_microstep: 5167.29 | bwd_inner_microstep: 5148.17 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2067 [2024-07-29 18:08:14,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.48 | bwd_microstep: 5288.59 | bwd_inner_microstep: 4879.34 | bwd_allreduce_microstep: 409.19 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3630 [2024-07-29 18:08:23,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.99 | bwd_microstep: 5190.03 | bwd_inner_microstep: 5099.49 | bwd_allreduce_microstep: 90.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-29 18:08:31,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.06 | bwd_microstep: 5004.82 | bwd_inner_microstep: 4985.43 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-29 18:08:40,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.52 | bwd_microstep: 5167.35 | bwd_inner_microstep: 5094.77 | bwd_allreduce_microstep: 72.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-29 18:08:48,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3225.96 | bwd_microstep: 4864.18 | bwd_inner_microstep: 4821.85 | bwd_allreduce_microstep: 42.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-29 18:08:57,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.24 | bwd_microstep: 4913.02 | bwd_inner_microstep: 4893.59 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-29 18:09:06,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-29 18:09:06,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.20 | bwd_microstep: 4993.68 | bwd_inner_microstep: 4946.12 | bwd_allreduce_microstep: 47.50 | step_microstep: 181.39 [2024-07-29 18:09:06,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28837.26 | bwd: 40588.95 | bwd_inner: 39868.71 | bwd_allreduce: 719.76 | step: 181.96 49%|████▉ | 330/671 [6:25:52<6:36:20, 69.74s/it] {'loss': 1.1622, 'learning_rate': 1.077254458679689e-05, 'epoch': 0.49} 49%|████▉ | 330/671 [6:25:52<6:36:20, 69.74s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3931 [2024-07-29 18:09:15,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.21 | bwd_microstep: 5503.40 | bwd_inner_microstep: 5420.95 | bwd_allreduce_microstep: 82.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2288 [2024-07-29 18:09:24,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.51 | bwd_microstep: 5225.13 | bwd_inner_microstep: 4820.07 | bwd_allreduce_microstep: 404.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2282 [2024-07-29 18:09:33,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.13 | bwd_microstep: 5206.81 | bwd_inner_microstep: 4803.20 | bwd_allreduce_microstep: 403.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3802 [2024-07-29 18:09:41,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.02 | bwd_microstep: 5046.83 | bwd_inner_microstep: 5027.53 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3782 [2024-07-29 18:09:50,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.18 | bwd_microstep: 5034.22 | bwd_inner_microstep: 5014.79 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3650 [2024-07-29 18:09:59,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.20 | bwd_microstep: 5116.00 | bwd_inner_microstep: 5033.77 | bwd_allreduce_microstep: 82.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-29 18:10:08,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.51 | bwd_microstep: 5122.51 | bwd_inner_microstep: 5057.82 | bwd_allreduce_microstep: 64.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-29 18:10:16,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 18:10:16,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.72 | bwd_microstep: 5025.03 | bwd_inner_microstep: 4969.66 | bwd_allreduce_microstep: 55.31 | step_microstep: 181.43 [2024-07-29 18:10:16,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29063.39 | bwd: 41279.90 | bwd_inner: 40147.72 | bwd_allreduce: 1131.70 | step: 182.00 49%|████▉ | 331/671 [6:27:02<6:36:45, 70.02s/it] {'loss': 1.1919, 'learning_rate': 1.0724348001617626e-05, 'epoch': 0.49} 49%|████▉ | 331/671 [6:27:02<6:36:45, 70.02s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3916 [2024-07-29 18:10:25,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3821.10 | bwd_microstep: 5213.22 | bwd_inner_microstep: 5188.28 | bwd_allreduce_microstep: 24.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3820 [2024-07-29 18:10:34,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.48 | bwd_microstep: 5124.21 | bwd_inner_microstep: 5100.54 | bwd_allreduce_microstep: 23.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-07-29 18:10:43,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3298.70 | bwd_microstep: 4917.82 | bwd_inner_microstep: 4890.35 | bwd_allreduce_microstep: 27.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3642 [2024-07-29 18:10:52,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.22 | bwd_microstep: 5221.52 | bwd_inner_microstep: 5114.99 | bwd_allreduce_microstep: 106.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-29 18:11:00,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3209.22 | bwd_microstep: 4789.18 | bwd_inner_microstep: 4769.83 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-29 18:11:08,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.26 | bwd_microstep: 4999.84 | bwd_inner_microstep: 4980.45 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3683 [2024-07-29 18:11:17,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.18 | bwd_microstep: 5189.14 | bwd_inner_microstep: 5099.60 | bwd_allreduce_microstep: 89.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-29 18:11:26,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-29 18:11:26,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.17 | bwd_microstep: 5073.49 | bwd_inner_microstep: 5017.56 | bwd_allreduce_microstep: 55.87 | step_microstep: 181.01 [2024-07-29 18:11:26,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28679.24 | bwd: 40528.40 | bwd_inner: 40161.54 | bwd_allreduce: 366.39 | step: 181.59 49%|████▉ | 332/671 [6:28:12<6:34:46, 69.87s/it] {'loss': 1.1744, 'learning_rate': 1.0676134495694437e-05, 'epoch': 0.49} 49%|████▉ | 332/671 [6:28:12<6:34:46, 69.87s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2011 [2024-07-29 18:11:35,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.71 | bwd_microstep: 5230.64 | bwd_inner_microstep: 4825.09 | bwd_allreduce_microstep: 405.49 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3774 [2024-07-29 18:11:44,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.57 | bwd_microstep: 5205.76 | bwd_inner_microstep: 5148.01 | bwd_allreduce_microstep: 57.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3760 [2024-07-29 18:11:52,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.21 | bwd_microstep: 5183.90 | bwd_inner_microstep: 5130.47 | bwd_allreduce_microstep: 53.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-29 18:12:01,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.72 | bwd_microstep: 5224.32 | bwd_inner_microstep: 4816.04 | bwd_allreduce_microstep: 408.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-29 18:12:10,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.09 | bwd_microstep: 5135.90 | bwd_inner_microstep: 5079.91 | bwd_allreduce_microstep: 55.92 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-07-29 18:12:19,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.22 | bwd_microstep: 5035.17 | bwd_inner_microstep: 5012.01 | bwd_allreduce_microstep: 23.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-29 18:12:27,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.18 | bwd_microstep: 4830.45 | bwd_inner_microstep: 4786.89 | bwd_allreduce_microstep: 43.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-29 18:12:36,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 18:12:36,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.89 | bwd_microstep: 5052.40 | bwd_inner_microstep: 4992.24 | bwd_allreduce_microstep: 60.09 | step_microstep: 180.98 [2024-07-29 18:12:36,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28544.50 | bwd: 40898.51 | bwd_inner: 39790.61 | bwd_allreduce: 1107.44 | step: 181.67 50%|████▉ | 333/671 [6:29:22<6:33:26, 69.84s/it] {'loss': 1.1615, 'learning_rate': 1.0627905195293135e-05, 'epoch': 0.5} 50%|████▉ | 333/671 [6:29:22<6:33:26, 69.84s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2392 [2024-07-29 18:12:45,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.54 | bwd_microstep: 5226.42 | bwd_inner_microstep: 4824.35 | bwd_allreduce_microstep: 402.01 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2046 [2024-07-29 18:12:53,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3056.45 | bwd_microstep: 5111.21 | bwd_inner_microstep: 4720.89 | bwd_allreduce_microstep: 390.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-29 18:13:01,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.49 | bwd_microstep: 5159.92 | bwd_inner_microstep: 5107.76 | bwd_allreduce_microstep: 52.10 | step_microstep: 0.10 dynamic ViT batch size: 7, images per sample: 3.5, dynamic token length: 1321 [2024-07-29 18:13:10,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.72 | bwd_microstep: 5261.53 | bwd_inner_microstep: 4855.76 | bwd_allreduce_microstep: 405.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-07-29 18:13:19,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.80 | bwd_microstep: 5203.47 | bwd_inner_microstep: 4796.28 | bwd_allreduce_microstep: 407.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-29 18:13:27,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3208.05 | bwd_microstep: 4791.82 | bwd_inner_microstep: 4757.00 | bwd_allreduce_microstep: 34.74 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2127 [2024-07-29 18:13:36,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.12 | bwd_microstep: 5190.62 | bwd_inner_microstep: 4786.95 | bwd_allreduce_microstep: 403.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-29 18:13:45,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 18:13:45,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3487.85 | bwd_microstep: 5067.58 | bwd_inner_microstep: 4676.74 | bwd_allreduce_microstep: 390.77 | step_microstep: 182.16 [2024-07-29 18:13:45,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27505.92 | bwd: 41012.55 | bwd_inner: 38525.66 | bwd_allreduce: 2486.40 | step: 182.75 50%|████▉ | 334/671 [6:30:31<6:30:35, 69.54s/it] {'loss': 1.1561, 'learning_rate': 1.0579661227048484e-05, 'epoch': 0.5} 50%|████▉ | 334/671 [6:30:31<6:30:35, 69.54s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-29 18:13:54,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.85 | bwd_microstep: 5341.85 | bwd_inner_microstep: 5249.26 | bwd_allreduce_microstep: 92.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-29 18:14:03,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.80 | bwd_microstep: 5300.27 | bwd_inner_microstep: 5204.42 | bwd_allreduce_microstep: 95.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3778 [2024-07-29 18:14:11,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.13 | bwd_microstep: 5028.72 | bwd_inner_microstep: 5009.34 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2253 [2024-07-29 18:14:20,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.01 | bwd_microstep: 5223.03 | bwd_inner_microstep: 4815.75 | bwd_allreduce_microstep: 407.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3732 [2024-07-29 18:14:29,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.51 | bwd_microstep: 5046.21 | bwd_inner_microstep: 4983.56 | bwd_allreduce_microstep: 62.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-29 18:14:38,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.42 | bwd_microstep: 4988.66 | bwd_inner_microstep: 4969.29 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.10 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2161 [2024-07-29 18:14:46,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.84 | bwd_microstep: 5044.51 | bwd_inner_microstep: 4653.97 | bwd_allreduce_microstep: 390.47 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3694 [2024-07-29 18:14:55,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 18:14:55,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.23 | bwd_microstep: 5058.24 | bwd_inner_microstep: 4982.08 | bwd_allreduce_microstep: 76.11 | step_microstep: 180.58 [2024-07-29 18:14:55,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28982.71 | bwd: 41031.48 | bwd_inner: 39867.62 | bwd_allreduce: 1163.39 | step: 181.18 50%|████▉ | 335/671 [6:31:41<6:30:46, 69.78s/it] {'loss': 1.194, 'learning_rate': 1.0531403717937888e-05, 'epoch': 0.5} 50%|████▉ | 335/671 [6:31:41<6:30:46, 69.78s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3945 [2024-07-29 18:15:03,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3200.93 | bwd_microstep: 5062.06 | bwd_inner_microstep: 5026.76 | bwd_allreduce_microstep: 35.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2333 [2024-07-29 18:15:12,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.17 | bwd_microstep: 5225.92 | bwd_inner_microstep: 4820.60 | bwd_allreduce_microstep: 405.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-29 18:15:21,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.27 | bwd_microstep: 5033.02 | bwd_inner_microstep: 5013.67 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-29 18:15:29,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3036.53 | bwd_microstep: 4999.71 | bwd_inner_microstep: 4613.10 | bwd_allreduce_microstep: 386.55 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3655 [2024-07-29 18:15:37,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3132.74 | bwd_microstep: 4943.78 | bwd_inner_microstep: 4891.61 | bwd_allreduce_microstep: 52.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-29 18:15:46,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.82 | bwd_microstep: 5236.41 | bwd_inner_microstep: 4827.66 | bwd_allreduce_microstep: 408.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3691 [2024-07-29 18:15:54,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.40 | bwd_microstep: 5062.65 | bwd_inner_microstep: 4990.05 | bwd_allreduce_microstep: 72.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-29 18:16:03,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 18:16:03,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.85 | bwd_microstep: 5152.48 | bwd_inner_microstep: 4751.87 | bwd_allreduce_microstep: 400.54 | step_microstep: 182.45 [2024-07-29 18:16:03,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27313.63 | bwd: 40716.01 | bwd_inner: 38935.25 | bwd_allreduce: 1780.28 | step: 183.03 50%|█████ | 336/671 [6:32:49<6:27:13, 69.36s/it] {'loss': 1.1245, 'learning_rate': 1.0483133795255072e-05, 'epoch': 0.5} 50%|█████ | 336/671 [6:32:49<6:27:13, 69.36s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-29 18:16:12,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3673.19 | bwd_microstep: 5322.17 | bwd_inner_microstep: 5230.08 | bwd_allreduce_microstep: 92.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3849 [2024-07-29 18:16:21,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.73 | bwd_microstep: 5152.93 | bwd_inner_microstep: 5109.90 | bwd_allreduce_microstep: 42.97 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2250 [2024-07-29 18:16:30,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.05 | bwd_microstep: 5187.60 | bwd_inner_microstep: 4782.32 | bwd_allreduce_microstep: 405.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3776 [2024-07-29 18:16:39,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.02 | bwd_microstep: 5143.30 | bwd_inner_microstep: 5091.34 | bwd_allreduce_microstep: 51.90 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-29 18:16:47,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3185.65 | bwd_microstep: 4704.69 | bwd_inner_microstep: 4679.68 | bwd_allreduce_microstep: 24.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-29 18:16:55,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.88 | bwd_microstep: 5079.78 | bwd_inner_microstep: 5017.40 | bwd_allreduce_microstep: 62.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3727 [2024-07-29 18:17:04,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.84 | bwd_microstep: 5050.05 | bwd_inner_microstep: 5009.59 | bwd_allreduce_microstep: 40.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-29 18:17:13,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.86 [2024-07-29 18:17:13,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.07 | bwd_microstep: 4998.24 | bwd_inner_microstep: 4978.85 | bwd_allreduce_microstep: 19.32 | step_microstep: 181.88 [2024-07-29 18:17:13,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28573.34 | bwd: 40638.73 | bwd_inner: 39899.11 | bwd_allreduce: 739.16 | step: 182.48 50%|█████ | 337/671 [6:33:59<6:26:23, 69.41s/it] {'loss': 1.1741, 'learning_rate': 1.0434852586583734e-05, 'epoch': 0.5} 50%|█████ | 337/671 [6:33:59<6:26:23, 69.41s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3552 [2024-07-29 18:17:22,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.49 | bwd_microstep: 5354.51 | bwd_inner_microstep: 5220.54 | bwd_allreduce_microstep: 133.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3807 [2024-07-29 18:17:31,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.41 | bwd_microstep: 5041.44 | bwd_inner_microstep: 5022.07 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3735 [2024-07-29 18:17:39,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.07 | bwd_microstep: 5187.56 | bwd_inner_microstep: 5111.89 | bwd_allreduce_microstep: 75.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-29 18:17:48,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.06 | bwd_microstep: 5171.45 | bwd_inner_microstep: 5095.28 | bwd_allreduce_microstep: 76.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-07-29 18:17:57,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.22 | bwd_microstep: 5156.03 | bwd_inner_microstep: 4755.40 | bwd_allreduce_microstep: 400.56 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2126 [2024-07-29 18:18:06,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.96 | bwd_microstep: 5221.85 | bwd_inner_microstep: 4815.92 | bwd_allreduce_microstep: 405.86 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2147 [2024-07-29 18:18:15,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.20 | bwd_microstep: 5253.52 | bwd_inner_microstep: 4845.24 | bwd_allreduce_microstep: 408.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 18:18:24,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 18:18:24,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.95 | bwd_microstep: 5159.96 | bwd_inner_microstep: 5081.47 | bwd_allreduce_microstep: 78.43 | step_microstep: 180.84 [2024-07-29 18:18:24,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28858.25 | bwd: 41546.30 | bwd_inner: 39947.73 | bwd_allreduce: 1598.10 | step: 181.42 50%|█████ | 338/671 [6:35:10<6:27:26, 69.81s/it] {'loss': 1.1384, 'learning_rate': 1.0386561219771222e-05, 'epoch': 0.5} 50%|█████ | 338/671 [6:35:10<6:27:26, 69.81s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2382 [2024-07-29 18:18:31,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3002.42 | bwd_microstep: 4837.17 | bwd_inner_microstep: 4469.41 | bwd_allreduce_microstep: 367.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-29 18:18:40,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.71 | bwd_microstep: 5315.71 | bwd_inner_microstep: 5220.66 | bwd_allreduce_microstep: 94.99 | step_microstep: 0.18 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2073 [2024-07-29 18:18:49,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.44 | bwd_microstep: 5176.55 | bwd_inner_microstep: 4774.80 | bwd_allreduce_microstep: 401.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2222 [2024-07-29 18:18:58,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.59 | bwd_microstep: 5166.24 | bwd_inner_microstep: 4765.56 | bwd_allreduce_microstep: 400.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-29 18:19:07,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.45 | bwd_microstep: 4977.15 | bwd_inner_microstep: 4957.68 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3748 [2024-07-29 18:19:15,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.89 | bwd_microstep: 4991.68 | bwd_inner_microstep: 4972.32 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-29 18:19:24,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.25 | bwd_microstep: 5041.57 | bwd_inner_microstep: 4983.40 | bwd_allreduce_microstep: 58.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-29 18:19:32,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.77 [2024-07-29 18:19:32,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3190.69 | bwd_microstep: 4694.15 | bwd_inner_microstep: 4674.71 | bwd_allreduce_microstep: 19.37 | step_microstep: 181.70 [2024-07-29 18:19:32,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27910.33 | bwd: 40200.20 | bwd_inner: 38818.49 | bwd_allreduce: 1381.24 | step: 182.37 51%|█████ | 339/671 [6:36:18<6:24:00, 69.40s/it] {'loss': 1.1889, 'learning_rate': 1.0338260822902166e-05, 'epoch': 0.5} 51%|█████ | 339/671 [6:36:18<6:24:00, 69.40s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3965 [2024-07-29 18:19:41,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.88 | bwd_microstep: 5177.75 | bwd_inner_microstep: 5130.81 | bwd_allreduce_microstep: 46.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-29 18:19:50,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.89 | bwd_microstep: 5237.96 | bwd_inner_microstep: 5148.76 | bwd_allreduce_microstep: 89.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-29 18:19:59,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.65 | bwd_microstep: 5241.91 | bwd_inner_microstep: 5156.08 | bwd_allreduce_microstep: 85.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-29 18:20:07,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.28 | bwd_microstep: 5176.78 | bwd_inner_microstep: 5096.01 | bwd_allreduce_microstep: 80.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3755 [2024-07-29 18:20:16,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3114.35 | bwd_microstep: 4984.03 | bwd_inner_microstep: 4938.60 | bwd_allreduce_microstep: 45.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2209 [2024-07-29 18:20:24,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3063.85 | bwd_microstep: 5044.32 | bwd_inner_microstep: 4656.35 | bwd_allreduce_microstep: 387.90 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1619 [2024-07-29 18:20:32,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.00 | bwd_microstep: 5172.62 | bwd_inner_microstep: 4772.22 | bwd_allreduce_microstep: 400.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-29 18:20:41,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 18:20:41,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.29 | bwd_microstep: 5083.75 | bwd_inner_microstep: 5017.97 | bwd_allreduce_microstep: 65.72 | step_microstep: 180.55 [2024-07-29 18:20:41,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27761.09 | bwd: 41119.10 | bwd_inner: 39916.74 | bwd_allreduce: 1201.89 | step: 181.13 51%|█████ | 340/671 [6:37:27<6:22:31, 69.34s/it] {'loss': 1.2371, 'learning_rate': 1.0289952524272147e-05, 'epoch': 0.51} 51%|█████ | 340/671 [6:37:27<6:22:31, 69.34s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3945 [2024-07-29 18:20:50,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.10 | bwd_microstep: 5283.65 | bwd_inner_microstep: 5227.49 | bwd_allreduce_microstep: 56.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3795 [2024-07-29 18:20:59,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.91 | bwd_microstep: 5212.22 | bwd_inner_microstep: 5157.46 | bwd_allreduce_microstep: 54.69 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2268 [2024-07-29 18:21:08,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.06 | bwd_microstep: 5162.34 | bwd_inner_microstep: 4759.55 | bwd_allreduce_microstep: 402.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-29 18:21:17,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.95 | bwd_microstep: 5226.75 | bwd_inner_microstep: 4820.45 | bwd_allreduce_microstep: 406.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-29 18:21:25,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.17 | bwd_microstep: 5047.32 | bwd_inner_microstep: 5019.97 | bwd_allreduce_microstep: 27.29 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2133 [2024-07-29 18:21:33,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3048.98 | bwd_microstep: 4998.43 | bwd_inner_microstep: 4612.10 | bwd_allreduce_microstep: 386.27 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3683 [2024-07-29 18:21:42,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.43 | bwd_microstep: 5151.14 | bwd_inner_microstep: 5095.63 | bwd_allreduce_microstep: 55.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-29 18:21:50,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 18:21:50,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3010.67 | bwd_microstep: 4911.05 | bwd_inner_microstep: 4532.66 | bwd_allreduce_microstep: 378.32 | step_microstep: 181.35 [2024-07-29 18:21:50,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27848.17 | bwd: 40992.89 | bwd_inner: 39225.25 | bwd_allreduce: 1767.18 | step: 181.92 51%|█████ | 341/671 [6:38:36<6:21:05, 69.29s/it] {'loss': 1.1536, 'learning_rate': 1.0241637452361327e-05, 'epoch': 0.51} 51%|█████ | 341/671 [6:38:36<6:21:05, 69.29s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2433 [2024-07-29 18:21:59,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.12 | bwd_microstep: 5175.37 | bwd_inner_microstep: 4773.25 | bwd_allreduce_microstep: 402.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-07-29 18:22:08,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.72 | bwd_microstep: 5043.85 | bwd_inner_microstep: 5021.62 | bwd_allreduce_microstep: 22.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3790 [2024-07-29 18:22:17,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.50 | bwd_microstep: 5194.20 | bwd_inner_microstep: 5137.42 | bwd_allreduce_microstep: 56.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-29 18:22:26,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.04 | bwd_microstep: 5187.14 | bwd_inner_microstep: 5128.88 | bwd_allreduce_microstep: 58.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-29 18:22:34,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.76 | bwd_microstep: 4995.66 | bwd_inner_microstep: 4976.29 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-29 18:22:43,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.95 | bwd_microstep: 5106.40 | bwd_inner_microstep: 5042.11 | bwd_allreduce_microstep: 64.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-29 18:22:52,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.09 | bwd_microstep: 4959.96 | bwd_inner_microstep: 4932.09 | bwd_allreduce_microstep: 27.80 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-29 18:23:01,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.88 [2024-07-29 18:23:01,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.87 | bwd_microstep: 4994.92 | bwd_inner_microstep: 4975.40 | bwd_allreduce_microstep: 19.45 | step_microstep: 182.15 [2024-07-29 18:23:01,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29408.94 | bwd: 40657.49 | bwd_inner: 39987.01 | bwd_allreduce: 670.01 | step: 182.73 51%|█████ | 342/671 [6:39:47<6:21:45, 69.62s/it] {'loss': 1.1573, 'learning_rate': 1.0193316735808085e-05, 'epoch': 0.51} 51%|█████ | 342/671 [6:39:47<6:21:45, 69.62s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3825 [2024-07-29 18:23:09,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3296.95 | bwd_microstep: 4960.34 | bwd_inner_microstep: 4926.73 | bwd_allreduce_microstep: 33.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-29 18:23:18,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.89 | bwd_microstep: 5226.05 | bwd_inner_microstep: 5132.18 | bwd_allreduce_microstep: 93.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3598 [2024-07-29 18:23:27,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.85 | bwd_microstep: 5208.89 | bwd_inner_microstep: 5125.72 | bwd_allreduce_microstep: 83.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2249 [2024-07-29 18:23:35,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3013.55 | bwd_microstep: 4966.31 | bwd_inner_microstep: 4582.33 | bwd_allreduce_microstep: 383.91 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-29 18:23:44,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.00 | bwd_microstep: 5205.15 | bwd_inner_microstep: 5143.15 | bwd_allreduce_microstep: 61.93 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-29 18:23:52,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.23 | bwd_microstep: 5056.75 | bwd_inner_microstep: 4664.93 | bwd_allreduce_microstep: 391.74 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3785 [2024-07-29 18:24:01,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.94 | bwd_microstep: 5027.18 | bwd_inner_microstep: 4988.07 | bwd_allreduce_microstep: 39.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2139 [2024-07-29 18:24:10,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 18:24:10,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.64 | bwd_microstep: 5076.18 | bwd_inner_microstep: 4681.71 | bwd_allreduce_microstep: 394.40 | step_microstep: 182.14 [2024-07-29 18:24:10,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27741.96 | bwd: 40726.84 | bwd_inner: 39244.77 | bwd_allreduce: 1481.59 | step: 182.83 51%|█████ | 343/671 [6:40:56<6:19:15, 69.38s/it] {'loss': 1.1573, 'learning_rate': 1.0144991503382676e-05, 'epoch': 0.51} 51%|█████ | 343/671 [6:40:56<6:19:15, 69.38s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3885 [2024-07-29 18:24:19,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3816.19 | bwd_microstep: 5135.81 | bwd_inner_microstep: 5116.57 | bwd_allreduce_microstep: 19.18 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3796 [2024-07-29 18:24:27,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.05 | bwd_microstep: 5106.71 | bwd_inner_microstep: 5064.10 | bwd_allreduce_microstep: 42.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-07-29 18:24:36,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.20 | bwd_microstep: 5029.56 | bwd_inner_microstep: 5010.25 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3707 [2024-07-29 18:24:45,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.53 | bwd_microstep: 5103.74 | bwd_inner_microstep: 5026.75 | bwd_allreduce_microstep: 76.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-29 18:24:53,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.04 | bwd_microstep: 5094.86 | bwd_inner_microstep: 5051.12 | bwd_allreduce_microstep: 43.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-29 18:25:01,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3009.87 | bwd_microstep: 4876.31 | bwd_inner_microstep: 4499.99 | bwd_allreduce_microstep: 376.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-29 18:25:09,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.78 | bwd_microstep: 4829.21 | bwd_inner_microstep: 4788.86 | bwd_allreduce_microstep: 40.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-29 18:25:18,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 18:25:18,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.02 | bwd_microstep: 4907.00 | bwd_inner_microstep: 4886.90 | bwd_allreduce_microstep: 20.04 | step_microstep: 180.97 [2024-07-29 18:25:18,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28300.57 | bwd: 40083.20 | bwd_inner: 39444.49 | bwd_allreduce: 638.23 | step: 181.56 51%|█████▏ | 344/671 [6:42:04<6:17:00, 69.18s/it] {'loss': 1.1788, 'learning_rate': 1.0096662883960833e-05, 'epoch': 0.51} 51%|█████▏ | 344/671 [6:42:04<6:17:00, 69.18s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3908 [2024-07-29 18:25:27,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3837.75 | bwd_microstep: 5214.91 | bwd_inner_microstep: 5190.81 | bwd_allreduce_microstep: 24.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2229 [2024-07-29 18:25:36,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.13 | bwd_microstep: 5252.52 | bwd_inner_microstep: 4844.30 | bwd_allreduce_microstep: 408.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3795 [2024-07-29 18:25:44,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3238.58 | bwd_microstep: 4845.85 | bwd_inner_microstep: 4826.49 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-29 18:25:53,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.35 | bwd_microstep: 5174.68 | bwd_inner_microstep: 5097.29 | bwd_allreduce_microstep: 77.32 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3738 [2024-07-29 18:26:02,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.19 | bwd_microstep: 5217.22 | bwd_inner_microstep: 5136.55 | bwd_allreduce_microstep: 80.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-29 18:26:11,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.56 | bwd_microstep: 4916.20 | bwd_inner_microstep: 4896.77 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.18 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2149 [2024-07-29 18:26:19,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.60 | bwd_microstep: 5120.16 | bwd_inner_microstep: 4723.65 | bwd_allreduce_microstep: 396.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-29 18:26:28,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-29 18:26:28,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.18 | bwd_microstep: 4909.94 | bwd_inner_microstep: 4890.68 | bwd_allreduce_microstep: 19.19 | step_microstep: 181.03 [2024-07-29 18:26:28,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28787.24 | bwd: 40651.46 | bwd_inner: 39606.48 | bwd_allreduce: 1044.51 | step: 181.72 51%|█████▏ | 345/671 [6:43:14<6:16:49, 69.35s/it] {'loss': 1.1736, 'learning_rate': 1.0048332006497406e-05, 'epoch': 0.51} 51%|█████▏ | 345/671 [6:43:14<6:16:49, 69.35s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-29 18:26:37,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.33 | bwd_microstep: 5242.13 | bwd_inner_microstep: 5179.53 | bwd_allreduce_microstep: 62.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3860 [2024-07-29 18:26:46,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.43 | bwd_microstep: 5290.74 | bwd_inner_microstep: 5230.96 | bwd_allreduce_microstep: 59.72 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3616 [2024-07-29 18:26:55,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.88 | bwd_microstep: 5215.06 | bwd_inner_microstep: 5121.05 | bwd_allreduce_microstep: 93.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-29 18:27:04,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.97 | bwd_microstep: 5033.20 | bwd_inner_microstep: 5013.83 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-29 18:27:12,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.58 | bwd_microstep: 5145.68 | bwd_inner_microstep: 5073.89 | bwd_allreduce_microstep: 71.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-29 18:27:21,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.49 | bwd_microstep: 5179.50 | bwd_inner_microstep: 5102.82 | bwd_allreduce_microstep: 76.61 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2178 [2024-07-29 18:27:30,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.25 | bwd_microstep: 5104.88 | bwd_inner_microstep: 4707.95 | bwd_allreduce_microstep: 396.87 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-29 18:27:39,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 18:27:39,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.72 | bwd_microstep: 5117.35 | bwd_inner_microstep: 4719.62 | bwd_allreduce_microstep: 397.66 | step_microstep: 180.90 [2024-07-29 18:27:39,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28954.54 | bwd: 41328.51 | bwd_inner: 40149.59 | bwd_allreduce: 1178.45 | step: 181.48 52%|█████▏ | 346/671 [6:44:25<6:17:43, 69.73s/it] {'loss': 1.224, 'learning_rate': 1e-05, 'epoch': 0.51} 52%|█████▏ | 346/671 [6:44:25<6:17:43, 69.73s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3920 [2024-07-29 18:27:48,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3780.98 | bwd_microstep: 5157.33 | bwd_inner_microstep: 5138.01 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3573 [2024-07-29 18:27:56,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3404.62 | bwd_microstep: 5094.48 | bwd_inner_microstep: 5021.29 | bwd_allreduce_microstep: 73.13 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2259 [2024-07-29 18:28:05,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.75 | bwd_microstep: 5132.18 | bwd_inner_microstep: 4735.74 | bwd_allreduce_microstep: 396.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-29 18:28:13,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3327.91 | bwd_microstep: 4995.38 | bwd_inner_microstep: 4609.13 | bwd_allreduce_microstep: 386.19 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3206 [2024-07-29 18:28:22,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.29 | bwd_microstep: 5203.84 | bwd_inner_microstep: 4974.57 | bwd_allreduce_microstep: 229.20 | step_microstep: 0.19 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2171 [2024-07-29 18:28:30,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3027.10 | bwd_microstep: 4896.60 | bwd_inner_microstep: 4521.55 | bwd_allreduce_microstep: 374.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 18:28:38,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3211.45 | bwd_microstep: 4714.86 | bwd_inner_microstep: 4689.61 | bwd_allreduce_microstep: 25.18 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-29 18:28:47,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 18:28:47,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.95 | bwd_microstep: 5068.33 | bwd_inner_microstep: 4674.70 | bwd_allreduce_microstep: 393.56 | step_microstep: 185.49 [2024-07-29 18:28:47,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27318.96 | bwd: 40262.98 | bwd_inner: 38364.55 | bwd_allreduce: 1897.96 | step: 186.21 52%|█████▏ | 347/671 [6:45:33<6:13:36, 69.19s/it] {'loss': 1.1847, 'learning_rate': 9.951667993502599e-06, 'epoch': 0.52} 52%|█████▏ | 347/671 [6:45:33<6:13:36, 69.19s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2116 [2024-07-29 18:28:56,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.88 | bwd_microstep: 5353.99 | bwd_inner_microstep: 4941.72 | bwd_allreduce_microstep: 412.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-29 18:29:04,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3227.11 | bwd_microstep: 4900.50 | bwd_inner_microstep: 4847.79 | bwd_allreduce_microstep: 52.65 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3785 [2024-07-29 18:29:13,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.12 | bwd_microstep: 5190.14 | bwd_inner_microstep: 5118.75 | bwd_allreduce_microstep: 71.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-29 18:29:21,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.60 | bwd_microstep: 5167.77 | bwd_inner_microstep: 5087.39 | bwd_allreduce_microstep: 80.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 18:29:30,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.75 | bwd_microstep: 5126.56 | bwd_inner_microstep: 5057.32 | bwd_allreduce_microstep: 69.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-29 18:29:39,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.27 | bwd_microstep: 5085.01 | bwd_inner_microstep: 5020.49 | bwd_allreduce_microstep: 64.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-29 18:29:47,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.07 | bwd_microstep: 5016.25 | bwd_inner_microstep: 4959.58 | bwd_allreduce_microstep: 56.60 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-29 18:29:56,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-29 18:29:56,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.08 | bwd_microstep: 4891.14 | bwd_inner_microstep: 4871.75 | bwd_allreduce_microstep: 19.32 | step_microstep: 182.46 [2024-07-29 18:29:56,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28468.77 | bwd: 40731.34 | bwd_inner: 39904.73 | bwd_allreduce: 826.13 | step: 183.04 52%|█████▏ | 348/671 [6:46:42<6:13:00, 69.29s/it] {'loss': 1.2004, 'learning_rate': 9.903337116039172e-06, 'epoch': 0.52} 52%|█████▏ | 348/671 [6:46:42<6:13:00, 69.29s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3893 [2024-07-29 18:30:05,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.07 | bwd_microstep: 5246.81 | bwd_inner_microstep: 5194.88 | bwd_allreduce_microstep: 51.86 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3811 [2024-07-29 18:30:14,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3817.24 | bwd_microstep: 5126.88 | bwd_inner_microstep: 5096.89 | bwd_allreduce_microstep: 29.93 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-07-29 18:30:23,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.51 | bwd_microstep: 5023.25 | bwd_inner_microstep: 5003.90 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-29 18:30:32,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.68 | bwd_microstep: 5177.68 | bwd_inner_microstep: 5117.43 | bwd_allreduce_microstep: 60.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3743 [2024-07-29 18:30:40,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.89 | bwd_microstep: 5098.31 | bwd_inner_microstep: 5028.32 | bwd_allreduce_microstep: 69.93 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-29 18:30:49,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.05 | bwd_microstep: 5064.21 | bwd_inner_microstep: 4671.63 | bwd_allreduce_microstep: 392.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-29 18:30:57,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.79 | bwd_microstep: 5002.19 | bwd_inner_microstep: 4949.37 | bwd_allreduce_microstep: 52.75 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-29 18:31:06,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-29 18:31:06,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.25 | bwd_microstep: 5007.00 | bwd_inner_microstep: 4954.61 | bwd_allreduce_microstep: 52.33 | step_microstep: 181.12 [2024-07-29 18:31:06,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29030.37 | bwd: 40746.32 | bwd_inner: 40016.97 | bwd_allreduce: 728.87 | step: 181.82 52%|█████▏ | 349/671 [6:47:52<6:13:10, 69.53s/it] {'loss': 1.1378, 'learning_rate': 9.855008496617326e-06, 'epoch': 0.52} 52%|█████▏ | 349/671 [6:47:52<6:13:10, 69.53s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2154 [2024-07-29 18:31:15,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.96 | bwd_microstep: 5473.04 | bwd_inner_microstep: 5054.87 | bwd_allreduce_microstep: 418.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3851 [2024-07-29 18:31:24,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.77 | bwd_microstep: 5101.48 | bwd_inner_microstep: 5082.00 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2080 [2024-07-29 18:31:32,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3000.89 | bwd_microstep: 4827.91 | bwd_inner_microstep: 4454.43 | bwd_allreduce_microstep: 373.42 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2177 [2024-07-29 18:31:41,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.40 | bwd_microstep: 5136.59 | bwd_inner_microstep: 4738.21 | bwd_allreduce_microstep: 398.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-29 18:31:49,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3052.66 | bwd_microstep: 5016.97 | bwd_inner_microstep: 4631.36 | bwd_allreduce_microstep: 385.55 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 18:31:58,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.81 | bwd_microstep: 4970.77 | bwd_inner_microstep: 4951.40 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-29 18:32:06,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.13 | bwd_microstep: 4912.16 | bwd_inner_microstep: 4887.72 | bwd_allreduce_microstep: 24.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-29 18:32:15,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 18:32:15,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.97 | bwd_microstep: 5043.80 | bwd_inner_microstep: 4653.52 | bwd_allreduce_microstep: 390.22 | step_microstep: 366.88 [2024-07-29 18:32:15,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27848.50 | bwd: 40482.71 | bwd_inner: 38453.44 | bwd_allreduce: 2028.78 | step: 367.45 52%|█████▏ | 350/671 [6:49:01<6:10:53, 69.33s/it] {'loss': 1.171, 'learning_rate': 9.806683264191916e-06, 'epoch': 0.52} 52%|█████▏ | 350/671 [6:49:01<6:10:53, 69.33s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3950 [2024-07-29 18:32:24,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.90 | bwd_microstep: 5142.28 | bwd_inner_microstep: 5114.54 | bwd_allreduce_microstep: 27.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2315 [2024-07-29 18:32:32,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3016.08 | bwd_microstep: 4975.67 | bwd_inner_microstep: 4592.18 | bwd_allreduce_microstep: 383.43 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-07-29 18:32:41,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.25 | bwd_microstep: 5016.09 | bwd_inner_microstep: 4995.02 | bwd_allreduce_microstep: 21.01 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2105 [2024-07-29 18:32:49,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.90 | bwd_microstep: 5162.60 | bwd_inner_microstep: 4762.07 | bwd_allreduce_microstep: 400.46 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-29 18:32:58,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.88 | bwd_microstep: 5175.43 | bwd_inner_microstep: 5120.34 | bwd_allreduce_microstep: 55.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-29 18:33:07,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.35 | bwd_microstep: 5045.16 | bwd_inner_microstep: 4988.79 | bwd_allreduce_microstep: 56.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 18:33:15,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.30 | bwd_microstep: 5069.99 | bwd_inner_microstep: 4678.48 | bwd_allreduce_microstep: 391.44 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-07-29 18:33:24,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-29 18:33:24,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.30 | bwd_microstep: 4910.62 | bwd_inner_microstep: 4884.15 | bwd_allreduce_microstep: 26.40 | step_microstep: 181.09 [2024-07-29 18:33:24,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28431.86 | bwd: 40497.83 | bwd_inner: 39135.52 | bwd_allreduce: 1361.84 | step: 181.79 52%|█████▏ | 351/671 [6:50:10<6:09:37, 69.31s/it] {'loss': 1.162, 'learning_rate': 9.75836254763868e-06, 'epoch': 0.52} 52%|█████▏ | 351/671 [6:50:10<6:09:37, 69.31s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2369 [2024-07-29 18:33:33,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3086.72 | bwd_microstep: 5092.99 | bwd_inner_microstep: 4705.57 | bwd_allreduce_microstep: 387.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3570 [2024-07-29 18:33:41,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3228.93 | bwd_microstep: 4907.35 | bwd_inner_microstep: 4850.67 | bwd_allreduce_microstep: 56.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-29 18:33:49,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.99 | bwd_microstep: 5170.50 | bwd_inner_microstep: 5094.94 | bwd_allreduce_microstep: 75.50 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-29 18:33:58,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.58 | bwd_microstep: 5188.85 | bwd_inner_microstep: 5105.65 | bwd_allreduce_microstep: 83.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-29 18:34:07,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.44 | bwd_microstep: 5152.53 | bwd_inner_microstep: 4751.61 | bwd_allreduce_microstep: 400.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-29 18:34:16,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.86 | bwd_microstep: 5086.40 | bwd_inner_microstep: 5022.45 | bwd_allreduce_microstep: 63.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-29 18:34:24,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.14 | bwd_microstep: 4934.71 | bwd_inner_microstep: 4907.76 | bwd_allreduce_microstep: 26.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-29 18:34:33,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 18:34:33,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.71 | bwd_microstep: 5142.09 | bwd_inner_microstep: 4741.20 | bwd_allreduce_microstep: 400.82 | step_microstep: 181.56 [2024-07-29 18:34:33,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27884.28 | bwd: 40675.38 | bwd_inner: 39179.79 | bwd_allreduce: 1495.13 | step: 182.15 52%|█████▏ | 352/671 [6:51:19<6:07:48, 69.18s/it] {'loss': 1.1802, 'learning_rate': 9.710047475727858e-06, 'epoch': 0.52} 52%|█████▏ | 352/671 [6:51:19<6:07:48, 69.18s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3592 [2024-07-29 18:34:42,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.80 | bwd_microstep: 5221.37 | bwd_inner_microstep: 5114.56 | bwd_allreduce_microstep: 106.74 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2309 [2024-07-29 18:34:51,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.23 | bwd_microstep: 5276.30 | bwd_inner_microstep: 4866.65 | bwd_allreduce_microstep: 409.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3614 [2024-07-29 18:35:00,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.45 | bwd_microstep: 5184.19 | bwd_inner_microstep: 5102.67 | bwd_allreduce_microstep: 81.46 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3645 [2024-07-29 18:35:09,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.04 | bwd_microstep: 5171.44 | bwd_inner_microstep: 5073.82 | bwd_allreduce_microstep: 97.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3766 [2024-07-29 18:35:17,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.26 | bwd_microstep: 5130.27 | bwd_inner_microstep: 5079.66 | bwd_allreduce_microstep: 50.55 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3728 [2024-07-29 18:35:26,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.27 | bwd_microstep: 5132.64 | bwd_inner_microstep: 5068.10 | bwd_allreduce_microstep: 64.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-29 18:35:35,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.29 | bwd_microstep: 4888.56 | bwd_inner_microstep: 4869.17 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-29 18:35:43,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.85 [2024-07-29 18:35:43,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.45 | bwd_microstep: 4905.01 | bwd_inner_microstep: 4882.58 | bwd_allreduce_microstep: 22.36 | step_microstep: 182.05 [2024-07-29 18:35:43,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29020.69 | bwd: 40909.76 | bwd_inner: 40057.15 | bwd_allreduce: 852.14 | step: 182.63 53%|█████▎ | 353/671 [6:52:29<6:08:21, 69.50s/it] {'loss': 1.1686, 'learning_rate': 9.661739177097834e-06, 'epoch': 0.53} 53%|█████▎ | 353/671 [6:52:29<6:08:21, 69.50s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2337 [2024-07-29 18:35:52,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3115.94 | bwd_microstep: 5192.28 | bwd_inner_microstep: 4796.76 | bwd_allreduce_microstep: 395.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2303 [2024-07-29 18:36:01,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.49 | bwd_microstep: 5348.14 | bwd_inner_microstep: 4933.95 | bwd_allreduce_microstep: 414.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-29 18:36:10,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.42 | bwd_microstep: 5231.32 | bwd_inner_microstep: 5139.33 | bwd_allreduce_microstep: 91.93 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3658 [2024-07-29 18:36:18,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.11 | bwd_microstep: 4844.33 | bwd_inner_microstep: 4820.67 | bwd_allreduce_microstep: 23.60 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-29 18:36:26,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3234.34 | bwd_microstep: 4832.33 | bwd_inner_microstep: 4806.54 | bwd_allreduce_microstep: 25.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-29 18:36:35,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.68 | bwd_microstep: 5032.74 | bwd_inner_microstep: 4992.36 | bwd_allreduce_microstep: 40.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-29 18:36:44,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.89 | bwd_microstep: 4988.49 | bwd_inner_microstep: 4969.13 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-29 18:36:52,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-29 18:36:52,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.44 | bwd_microstep: 5075.39 | bwd_inner_microstep: 5010.81 | bwd_allreduce_microstep: 64.51 | step_microstep: 181.73 [2024-07-29 18:36:52,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28164.21 | bwd: 40545.00 | bwd_inner: 39469.49 | bwd_allreduce: 1075.04 | step: 182.43 53%|█████▎ | 354/671 [6:53:38<6:06:28, 69.36s/it] {'loss': 1.1529, 'learning_rate': 9.61343878022878e-06, 'epoch': 0.53} 53%|█████▎ | 354/671 [6:53:38<6:06:28, 69.36s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3900 [2024-07-29 18:37:01,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3800.88 | bwd_microstep: 5132.49 | bwd_inner_microstep: 5113.40 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3826 [2024-07-29 18:37:10,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.45 | bwd_microstep: 5042.89 | bwd_inner_microstep: 5023.54 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2260 [2024-07-29 18:37:19,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.16 | bwd_microstep: 5305.43 | bwd_inner_microstep: 4892.96 | bwd_allreduce_microstep: 412.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2248 [2024-07-29 18:37:28,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.41 | bwd_microstep: 5171.81 | bwd_inner_microstep: 4770.52 | bwd_allreduce_microstep: 401.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-29 18:37:36,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3232.76 | bwd_microstep: 4841.01 | bwd_inner_microstep: 4799.12 | bwd_allreduce_microstep: 41.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-29 18:37:44,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.29 | bwd_microstep: 4857.65 | bwd_inner_microstep: 4815.84 | bwd_allreduce_microstep: 41.74 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-29 18:37:53,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.72 | bwd_microstep: 5113.26 | bwd_inner_microstep: 4713.83 | bwd_allreduce_microstep: 399.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-29 18:38:01,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-29 18:38:01,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3225.36 | bwd_microstep: 4828.78 | bwd_inner_microstep: 4785.86 | bwd_allreduce_microstep: 42.86 | step_microstep: 181.61 [2024-07-29 18:38:01,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27872.93 | bwd: 40293.30 | bwd_inner: 38915.03 | bwd_allreduce: 1377.80 | step: 182.21 53%|█████▎ | 355/671 [6:54:47<6:03:56, 69.10s/it] {'loss': 1.1879, 'learning_rate': 9.565147413416266e-06, 'epoch': 0.53} 53%|█████▎ | 355/671 [6:54:47<6:03:56, 69.10s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3591 [2024-07-29 18:38:10,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.75 | bwd_microstep: 5137.85 | bwd_inner_microstep: 5064.00 | bwd_allreduce_microstep: 73.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-29 18:38:19,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.19 | bwd_microstep: 5172.05 | bwd_inner_microstep: 5092.36 | bwd_allreduce_microstep: 79.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-29 18:38:27,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.72 | bwd_microstep: 5025.17 | bwd_inner_microstep: 5003.42 | bwd_allreduce_microstep: 21.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-29 18:38:36,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.74 | bwd_microstep: 5161.53 | bwd_inner_microstep: 5105.46 | bwd_allreduce_microstep: 56.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-29 18:38:45,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.29 | bwd_microstep: 5166.73 | bwd_inner_microstep: 5088.14 | bwd_allreduce_microstep: 78.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 18:38:54,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.25 | bwd_microstep: 5052.10 | bwd_inner_microstep: 4990.55 | bwd_allreduce_microstep: 61.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-29 18:39:02,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.29 | bwd_microstep: 4985.71 | bwd_inner_microstep: 4938.65 | bwd_allreduce_microstep: 46.99 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2137 [2024-07-29 18:39:10,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 18:39:10,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3032.42 | bwd_microstep: 4903.41 | bwd_inner_microstep: 4527.04 | bwd_allreduce_microstep: 376.31 | step_microstep: 181.05 [2024-07-29 18:39:10,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28406.57 | bwd: 40604.53 | bwd_inner: 39809.56 | bwd_allreduce: 794.51 | step: 181.63 53%|█████▎ | 356/671 [6:55:56<6:03:09, 69.17s/it] {'loss': 1.1072, 'learning_rate': 9.516866204744932e-06, 'epoch': 0.53} 53%|█████▎ | 356/671 [6:55:56<6:03:09, 69.17s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3584 [2024-07-29 18:39:19,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.05 | bwd_microstep: 5148.66 | bwd_inner_microstep: 5050.85 | bwd_allreduce_microstep: 97.74 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-29 18:39:28,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.88 | bwd_microstep: 5151.52 | bwd_inner_microstep: 5098.77 | bwd_allreduce_microstep: 52.69 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2212 [2024-07-29 18:39:36,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3028.77 | bwd_microstep: 4964.02 | bwd_inner_microstep: 4580.68 | bwd_allreduce_microstep: 383.28 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3749 [2024-07-29 18:39:44,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3113.04 | bwd_microstep: 4952.96 | bwd_inner_microstep: 4913.29 | bwd_allreduce_microstep: 39.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-29 18:39:53,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.35 | bwd_microstep: 5119.59 | bwd_inner_microstep: 5048.54 | bwd_allreduce_microstep: 70.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-29 18:40:01,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.95 | bwd_microstep: 5065.82 | bwd_inner_microstep: 5006.89 | bwd_allreduce_microstep: 58.87 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-07-29 18:40:10,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.35 | bwd_microstep: 5022.94 | bwd_inner_microstep: 4996.51 | bwd_allreduce_microstep: 26.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-29 18:40:19,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 18:40:19,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.34 | bwd_microstep: 5024.30 | bwd_inner_microstep: 4969.85 | bwd_allreduce_microstep: 54.38 | step_microstep: 181.25 [2024-07-29 18:40:19,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27819.65 | bwd: 40449.80 | bwd_inner: 39665.31 | bwd_allreduce: 784.03 | step: 181.93 53%|█████▎ | 357/671 [6:57:05<6:01:06, 69.00s/it] {'loss': 1.1716, 'learning_rate': 9.468596282062112e-06, 'epoch': 0.53} 53%|█████▎ | 357/671 [6:57:05<6:01:06, 69.00s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2401 [2024-07-29 18:40:28,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.89 | bwd_microstep: 5400.54 | bwd_inner_microstep: 4986.94 | bwd_allreduce_microstep: 413.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3834 [2024-07-29 18:40:37,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.42 | bwd_microstep: 5047.91 | bwd_inner_microstep: 5012.29 | bwd_allreduce_microstep: 35.55 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-29 18:40:45,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.98 | bwd_microstep: 5209.63 | bwd_inner_microstep: 5120.17 | bwd_allreduce_microstep: 89.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-29 18:40:54,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.38 | bwd_microstep: 5155.63 | bwd_inner_microstep: 5104.85 | bwd_allreduce_microstep: 50.71 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-29 18:41:03,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.10 | bwd_microstep: 4982.63 | bwd_inner_microstep: 4960.79 | bwd_allreduce_microstep: 21.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-07-29 18:41:12,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.01 | bwd_microstep: 5007.91 | bwd_inner_microstep: 4988.48 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-29 18:41:21,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.36 | bwd_microstep: 5234.93 | bwd_inner_microstep: 5101.82 | bwd_allreduce_microstep: 133.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-29 18:41:30,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 18:41:30,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.94 | bwd_microstep: 5095.42 | bwd_inner_microstep: 4698.71 | bwd_allreduce_microstep: 396.64 | step_microstep: 207.62 [2024-07-29 18:41:30,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29171.99 | bwd: 41134.58 | bwd_inner: 39973.99 | bwd_allreduce: 1160.11 | step: 208.20 53%|█████▎ | 358/671 [6:58:16<6:02:33, 69.50s/it] {'loss': 1.2, 'learning_rate': 9.420338772951521e-06, 'epoch': 0.53} 53%|█████▎ | 358/671 [6:58:16<6:02:33, 69.50s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3560 [2024-07-29 18:41:39,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.56 | bwd_microstep: 5362.40 | bwd_inner_microstep: 5217.79 | bwd_allreduce_microstep: 144.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2452 [2024-07-29 18:41:47,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.72 | bwd_microstep: 5241.91 | bwd_inner_microstep: 4836.21 | bwd_allreduce_microstep: 405.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2238 [2024-07-29 18:41:56,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3074.59 | bwd_microstep: 5014.42 | bwd_inner_microstep: 4627.25 | bwd_allreduce_microstep: 387.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-29 18:42:04,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.90 | bwd_microstep: 5141.62 | bwd_inner_microstep: 4740.67 | bwd_allreduce_microstep: 400.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-29 18:42:13,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.38 | bwd_microstep: 5041.87 | bwd_inner_microstep: 5012.48 | bwd_allreduce_microstep: 29.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-29 18:42:22,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.11 | bwd_microstep: 5206.13 | bwd_inner_microstep: 4800.04 | bwd_allreduce_microstep: 406.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2161 [2024-07-29 18:42:30,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.37 | bwd_microstep: 5050.88 | bwd_inner_microstep: 4658.32 | bwd_allreduce_microstep: 392.50 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-29 18:42:39,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 18:42:39,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3015.26 | bwd_microstep: 4953.24 | bwd_inner_microstep: 4570.68 | bwd_allreduce_microstep: 382.50 | step_microstep: 180.63 [2024-07-29 18:42:39,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27616.79 | bwd: 41012.45 | bwd_inner: 38463.39 | bwd_allreduce: 2548.60 | step: 181.20 54%|█████▎ | 359/671 [6:59:24<6:00:32, 69.33s/it] {'loss': 1.1524, 'learning_rate': 9.372094804706867e-06, 'epoch': 0.53} 54%|█████▎ | 359/671 [6:59:25<6:00:32, 69.33s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3562 [2024-07-29 18:42:47,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.14 | bwd_microstep: 5288.44 | bwd_inner_microstep: 5186.76 | bwd_allreduce_microstep: 101.62 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3883 [2024-07-29 18:42:56,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.07 | bwd_microstep: 5202.24 | bwd_inner_microstep: 5156.11 | bwd_allreduce_microstep: 46.07 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2239 [2024-07-29 18:43:05,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.83 | bwd_microstep: 5152.78 | bwd_inner_microstep: 4750.52 | bwd_allreduce_microstep: 402.20 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3633 [2024-07-29 18:43:14,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.33 | bwd_microstep: 5209.15 | bwd_inner_microstep: 5110.65 | bwd_allreduce_microstep: 98.44 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-07-29 18:43:22,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3227.65 | bwd_microstep: 4811.40 | bwd_inner_microstep: 4792.05 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-29 18:43:31,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.31 | bwd_microstep: 5092.90 | bwd_inner_microstep: 4697.44 | bwd_allreduce_microstep: 395.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-07-29 18:43:39,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.70 | bwd_microstep: 5124.56 | bwd_inner_microstep: 4727.28 | bwd_allreduce_microstep: 397.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-29 18:43:48,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.81 [2024-07-29 18:43:48,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.46 | bwd_microstep: 5131.03 | bwd_inner_microstep: 5078.62 | bwd_allreduce_microstep: 52.35 | step_microstep: 181.39 [2024-07-29 18:43:48,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28216.40 | bwd: 41012.49 | bwd_inner: 39499.37 | bwd_allreduce: 1512.64 | step: 181.99 54%|█████▎ | 360/671 [7:00:34<5:59:43, 69.40s/it] {'loss': 1.1684, 'learning_rate': 9.323865504305566e-06, 'epoch': 0.54} 54%|█████▎ | 360/671 [7:00:34<5:59:43, 69.40s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3987 [2024-07-29 18:43:57,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.17 | bwd_microstep: 5217.57 | bwd_inner_microstep: 5190.21 | bwd_allreduce_microstep: 27.29 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2046 [2024-07-29 18:44:06,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.33 | bwd_microstep: 5242.33 | bwd_inner_microstep: 4837.77 | bwd_allreduce_microstep: 404.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2273 [2024-07-29 18:44:15,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.23 | bwd_microstep: 5211.62 | bwd_inner_microstep: 4808.41 | bwd_allreduce_microstep: 403.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-29 18:44:23,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.86 | bwd_microstep: 5194.58 | bwd_inner_microstep: 5137.99 | bwd_allreduce_microstep: 56.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-29 18:44:32,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.30 | bwd_microstep: 5130.16 | bwd_inner_microstep: 5083.35 | bwd_allreduce_microstep: 46.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-29 18:44:41,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.71 | bwd_microstep: 5009.54 | bwd_inner_microstep: 4990.08 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2240 [2024-07-29 18:44:50,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.17 | bwd_microstep: 5202.55 | bwd_inner_microstep: 4798.33 | bwd_allreduce_microstep: 404.16 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2148 [2024-07-29 18:44:59,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 18:44:59,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.29 | bwd_microstep: 5229.60 | bwd_inner_microstep: 4825.18 | bwd_allreduce_microstep: 404.35 | step_microstep: 180.95 [2024-07-29 18:44:59,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28877.95 | bwd: 41437.94 | bwd_inner: 39671.25 | bwd_allreduce: 1766.20 | step: 181.54 54%|█████▍ | 361/671 [7:01:45<6:00:29, 69.77s/it] {'loss': 1.163, 'learning_rate': 9.275651998382377e-06, 'epoch': 0.54} 54%|█████▍ | 361/671 [7:01:45<6:00:29, 69.77s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2445 [2024-07-29 18:45:08,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.47 | bwd_microstep: 5365.82 | bwd_inner_microstep: 4952.25 | bwd_allreduce_microstep: 413.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-29 18:45:16,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3256.37 | bwd_microstep: 4973.23 | bwd_inner_microstep: 4911.04 | bwd_allreduce_microstep: 62.13 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2265 [2024-07-29 18:45:25,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.08 | bwd_microstep: 5276.40 | bwd_inner_microstep: 4868.69 | bwd_allreduce_microstep: 407.65 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3757 [2024-07-29 18:45:33,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3122.20 | bwd_microstep: 4904.74 | bwd_inner_microstep: 4868.61 | bwd_allreduce_microstep: 36.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-29 18:45:42,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.15 | bwd_microstep: 5167.31 | bwd_inner_microstep: 4763.70 | bwd_allreduce_microstep: 403.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-29 18:45:50,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.62 | bwd_microstep: 5156.37 | bwd_inner_microstep: 4756.86 | bwd_allreduce_microstep: 399.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-29 18:45:59,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.16 | bwd_microstep: 5152.24 | bwd_inner_microstep: 5082.56 | bwd_allreduce_microstep: 69.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-29 18:46:08,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 18:46:08,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.69 | bwd_microstep: 5187.98 | bwd_inner_microstep: 5113.68 | bwd_allreduce_microstep: 74.23 | step_microstep: 181.29 [2024-07-29 18:46:08,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27836.65 | bwd: 41184.07 | bwd_inner: 39317.32 | bwd_allreduce: 1866.27 | step: 181.86 54%|█████▍ | 362/671 [7:02:54<5:58:40, 69.65s/it] {'loss': 1.1506, 'learning_rate': 9.227455413203117e-06, 'epoch': 0.54} 54%|█████▍ | 362/671 [7:02:54<5:58:40, 69.65s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2111 [2024-07-29 18:46:17,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3310.44 | bwd_microstep: 5143.06 | bwd_inner_microstep: 4752.63 | bwd_allreduce_microstep: 390.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3796 [2024-07-29 18:46:25,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.25 | bwd_microstep: 5031.66 | bwd_inner_microstep: 5007.61 | bwd_allreduce_microstep: 23.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3788 [2024-07-29 18:46:34,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.58 | bwd_microstep: 5203.84 | bwd_inner_microstep: 5146.54 | bwd_allreduce_microstep: 57.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-29 18:46:43,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.03 | bwd_microstep: 5106.00 | bwd_inner_microstep: 5038.84 | bwd_allreduce_microstep: 67.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3639 [2024-07-29 18:46:52,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.15 | bwd_microstep: 5137.57 | bwd_inner_microstep: 5048.90 | bwd_allreduce_microstep: 88.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-07-29 18:47:00,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.04 | bwd_microstep: 4992.99 | bwd_inner_microstep: 4959.57 | bwd_allreduce_microstep: 33.35 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2129 [2024-07-29 18:47:09,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3476.73 | bwd_microstep: 5041.23 | bwd_inner_microstep: 4648.20 | bwd_allreduce_microstep: 392.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3671 [2024-07-29 18:47:18,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-29 18:47:18,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.26 | bwd_microstep: 4945.79 | bwd_inner_microstep: 4919.37 | bwd_allreduce_microstep: 26.35 | step_microstep: 184.78 [2024-07-29 18:47:18,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28793.39 | bwd: 40602.12 | bwd_inner: 39521.60 | bwd_allreduce: 1080.05 | step: 185.36 54%|█████▍ | 363/671 [7:04:04<5:57:38, 69.67s/it] {'loss': 1.1785, 'learning_rate': 9.179276874638315e-06, 'epoch': 0.54} 54%|█████▍ | 363/671 [7:04:04<5:57:38, 69.67s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3918 [2024-07-29 18:47:27,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3850.41 | bwd_microstep: 5215.18 | bwd_inner_microstep: 5189.52 | bwd_allreduce_microstep: 25.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2277 [2024-07-29 18:47:36,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3487.91 | bwd_microstep: 5121.82 | bwd_inner_microstep: 4721.81 | bwd_allreduce_microstep: 399.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3776 [2024-07-29 18:47:44,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.80 | bwd_microstep: 4998.38 | bwd_inner_microstep: 4979.04 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-29 18:47:52,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3229.80 | bwd_microstep: 4812.87 | bwd_inner_microstep: 4773.54 | bwd_allreduce_microstep: 39.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3627 [2024-07-29 18:48:01,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.49 | bwd_microstep: 5128.50 | bwd_inner_microstep: 5039.40 | bwd_allreduce_microstep: 89.03 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-29 18:48:10,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.86 | bwd_microstep: 5014.02 | bwd_inner_microstep: 4990.98 | bwd_allreduce_microstep: 22.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-29 18:48:18,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.77 | bwd_microstep: 5092.11 | bwd_inner_microstep: 4696.98 | bwd_allreduce_microstep: 395.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-29 18:48:27,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 18:48:27,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.20 | bwd_microstep: 5077.26 | bwd_inner_microstep: 5015.00 | bwd_allreduce_microstep: 62.19 | step_microstep: 181.28 [2024-07-29 18:48:27,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28734.13 | bwd: 40460.11 | bwd_inner: 39406.22 | bwd_allreduce: 1053.42 | step: 181.86 54%|█████▍ | 364/671 [7:05:13<5:56:15, 69.63s/it] {'loss': 1.1574, 'learning_rate': 9.131117508136952e-06, 'epoch': 0.54} 54%|█████▍ | 364/671 [7:05:13<5:56:15, 69.63s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3939 [2024-07-29 18:48:36,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3657.98 | bwd_microstep: 5204.09 | bwd_inner_microstep: 5163.50 | bwd_allreduce_microstep: 40.52 | step_microstep: 0.21 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-29 18:48:45,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.84 | bwd_microstep: 5156.93 | bwd_inner_microstep: 5083.13 | bwd_allreduce_microstep: 73.74 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2088 [2024-07-29 18:48:54,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.65 | bwd_microstep: 5254.19 | bwd_inner_microstep: 4849.79 | bwd_allreduce_microstep: 404.34 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2089 [2024-07-29 18:49:02,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3073.95 | bwd_microstep: 5087.33 | bwd_inner_microstep: 4697.55 | bwd_allreduce_microstep: 389.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3759 [2024-07-29 18:49:11,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.44 | bwd_microstep: 5169.23 | bwd_inner_microstep: 5114.68 | bwd_allreduce_microstep: 54.48 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2186 [2024-07-29 18:49:20,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.93 | bwd_microstep: 5160.30 | bwd_inner_microstep: 4759.59 | bwd_allreduce_microstep: 400.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-29 18:49:28,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.21 | bwd_microstep: 4964.10 | bwd_inner_microstep: 4920.32 | bwd_allreduce_microstep: 43.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-29 18:49:37,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 18:49:37,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.74 | bwd_microstep: 5098.61 | bwd_inner_microstep: 4702.28 | bwd_allreduce_microstep: 396.26 | step_microstep: 181.62 [2024-07-29 18:49:37,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28049.64 | bwd: 41094.77 | bwd_inner: 39290.77 | bwd_allreduce: 1803.52 | step: 182.33 54%|█████▍ | 365/671 [7:06:23<5:54:51, 69.58s/it] {'loss': 1.1907, 'learning_rate': 9.082978438700141e-06, 'epoch': 0.54} 54%|█████▍ | 365/671 [7:06:23<5:54:51, 69.58s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3936 [2024-07-29 18:49:46,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.70 | bwd_microstep: 5326.69 | bwd_inner_microstep: 5270.24 | bwd_allreduce_microstep: 56.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2275 [2024-07-29 18:49:54,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3049.25 | bwd_microstep: 5067.14 | bwd_inner_microstep: 4677.00 | bwd_allreduce_microstep: 390.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3596 [2024-07-29 18:50:03,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.06 | bwd_microstep: 5187.73 | bwd_inner_microstep: 5099.81 | bwd_allreduce_microstep: 87.86 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3759 [2024-07-29 18:50:12,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.72 | bwd_microstep: 5194.68 | bwd_inner_microstep: 5122.72 | bwd_allreduce_microstep: 71.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-29 18:50:20,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.75 | bwd_microstep: 5182.40 | bwd_inner_microstep: 5120.65 | bwd_allreduce_microstep: 61.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-29 18:50:29,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.18 | bwd_microstep: 5171.73 | bwd_inner_microstep: 5113.35 | bwd_allreduce_microstep: 58.31 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-29 18:50:38,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.12 | bwd_microstep: 5167.07 | bwd_inner_microstep: 5093.54 | bwd_allreduce_microstep: 73.47 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3679 [2024-07-29 18:50:47,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-29 18:50:47,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.40 | bwd_microstep: 4972.80 | bwd_inner_microstep: 4902.61 | bwd_allreduce_microstep: 70.12 | step_microstep: 181.01 [2024-07-29 18:50:47,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28519.09 | bwd: 41270.23 | bwd_inner: 40399.85 | bwd_allreduce: 869.91 | step: 181.58 55%|█████▍ | 366/671 [7:07:33<5:54:31, 69.74s/it] {'loss': 1.1482, 'learning_rate': 9.034860790854848e-06, 'epoch': 0.54} 55%|█████▍ | 366/671 [7:07:33<5:54:31, 69.74s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2105 [2024-07-29 18:50:56,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.55 | bwd_microstep: 5380.34 | bwd_inner_microstep: 4963.76 | bwd_allreduce_microstep: 416.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4030 [2024-07-29 18:51:05,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3848.54 | bwd_microstep: 5300.55 | bwd_inner_microstep: 5281.15 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2253 [2024-07-29 18:51:14,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3310.89 | bwd_microstep: 5091.40 | bwd_inner_microstep: 4698.04 | bwd_allreduce_microstep: 393.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3639 [2024-07-29 18:51:22,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3116.34 | bwd_microstep: 4982.25 | bwd_inner_microstep: 4917.98 | bwd_allreduce_microstep: 64.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-29 18:51:30,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.39 | bwd_microstep: 5042.93 | bwd_inner_microstep: 5015.02 | bwd_allreduce_microstep: 27.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-29 18:51:39,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3441.79 | bwd_microstep: 5013.30 | bwd_inner_microstep: 4624.26 | bwd_allreduce_microstep: 388.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-29 18:51:48,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.15 | bwd_microstep: 4981.59 | bwd_inner_microstep: 4951.41 | bwd_allreduce_microstep: 30.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-29 18:51:57,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 18:51:57,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.32 | bwd_microstep: 5138.68 | bwd_inner_microstep: 5084.32 | bwd_allreduce_microstep: 54.28 | step_microstep: 181.14 [2024-07-29 18:51:57,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28386.87 | bwd: 40931.01 | bwd_inner: 39535.88 | bwd_allreduce: 1394.66 | step: 181.70 55%|█████▍ | 367/671 [7:08:43<5:53:13, 69.71s/it] {'loss': 1.1086, 'learning_rate': 8.986765688627652e-06, 'epoch': 0.55} 55%|█████▍ | 367/671 [7:08:43<5:53:13, 69.71s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3808 [2024-07-29 18:52:06,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.46 | bwd_microstep: 5329.61 | bwd_inner_microstep: 5265.09 | bwd_allreduce_microstep: 64.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3829 [2024-07-29 18:52:14,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.03 | bwd_microstep: 5060.14 | bwd_inner_microstep: 5039.01 | bwd_allreduce_microstep: 21.07 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2263 [2024-07-29 18:52:23,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.95 | bwd_microstep: 5262.23 | bwd_inner_microstep: 4852.03 | bwd_allreduce_microstep: 410.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-29 18:52:32,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.54 | bwd_microstep: 5139.75 | bwd_inner_microstep: 5093.89 | bwd_allreduce_microstep: 45.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-29 18:52:41,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.55 | bwd_microstep: 5143.07 | bwd_inner_microstep: 5071.75 | bwd_allreduce_microstep: 71.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-29 18:52:49,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3168.05 | bwd_microstep: 4654.07 | bwd_inner_microstep: 4634.71 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-29 18:52:57,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.64 | bwd_microstep: 5098.51 | bwd_inner_microstep: 5037.09 | bwd_allreduce_microstep: 61.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3699 [2024-07-29 18:53:06,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 18:53:06,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.85 | bwd_microstep: 5002.99 | bwd_inner_microstep: 4937.64 | bwd_allreduce_microstep: 65.29 | step_microstep: 181.55 [2024-07-29 18:53:06,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28463.97 | bwd: 40690.35 | bwd_inner: 39931.13 | bwd_allreduce: 758.75 | step: 182.14 55%|█████▍ | 368/671 [7:09:52<5:51:42, 69.64s/it] {'loss': 1.1637, 'learning_rate': 8.938694255518442e-06, 'epoch': 0.55} 55%|█████▍ | 368/671 [7:09:52<5:51:42, 69.64s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3536 [2024-07-29 18:53:15,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.40 | bwd_microstep: 5152.75 | bwd_inner_microstep: 5067.76 | bwd_allreduce_microstep: 84.92 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2281 [2024-07-29 18:53:24,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.78 | bwd_microstep: 5299.25 | bwd_inner_microstep: 4889.42 | bwd_allreduce_microstep: 409.76 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-29 18:53:32,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.39 | bwd_microstep: 5115.34 | bwd_inner_microstep: 5044.84 | bwd_allreduce_microstep: 70.44 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3816 [2024-07-29 18:53:41,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.93 | bwd_microstep: 5117.82 | bwd_inner_microstep: 5065.64 | bwd_allreduce_microstep: 52.12 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2112 [2024-07-29 18:53:50,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.49 | bwd_microstep: 5220.92 | bwd_inner_microstep: 4814.54 | bwd_allreduce_microstep: 406.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-29 18:53:58,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.82 | bwd_microstep: 4804.69 | bwd_inner_microstep: 4785.26 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 18:54:06,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3010.71 | bwd_microstep: 4893.61 | bwd_inner_microstep: 4517.42 | bwd_allreduce_microstep: 376.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-29 18:54:14,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 18:54:14,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3212.18 | bwd_microstep: 4713.41 | bwd_inner_microstep: 4690.85 | bwd_allreduce_microstep: 22.50 | step_microstep: 181.15 [2024-07-29 18:54:14,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27374.61 | bwd: 40317.77 | bwd_inner: 38875.67 | bwd_allreduce: 1441.63 | step: 181.83 55%|█████▍ | 369/671 [7:11:00<5:48:05, 69.16s/it] {'loss': 1.1991, 'learning_rate': 8.890647614474223e-06, 'epoch': 0.55} 55%|█████▍ | 369/671 [7:11:00<5:48:05, 69.16s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3876 [2024-07-29 18:54:23,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.21 | bwd_microstep: 5110.39 | bwd_inner_microstep: 5091.18 | bwd_allreduce_microstep: 19.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3839 [2024-07-29 18:54:32,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.45 | bwd_microstep: 5062.92 | bwd_inner_microstep: 5043.58 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-29 18:54:41,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.06 | bwd_microstep: 5005.84 | bwd_inner_microstep: 4986.46 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3800 [2024-07-29 18:54:49,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.71 | bwd_microstep: 5034.84 | bwd_inner_microstep: 5015.45 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-29 18:54:58,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.44 | bwd_microstep: 5041.77 | bwd_inner_microstep: 5017.04 | bwd_allreduce_microstep: 24.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-29 18:55:07,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.80 | bwd_microstep: 5049.99 | bwd_inner_microstep: 5009.46 | bwd_allreduce_microstep: 40.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 18:55:16,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.29 | bwd_microstep: 4988.39 | bwd_inner_microstep: 4968.94 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-29 18:55:24,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 18:55:24,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3469.56 | bwd_microstep: 5045.31 | bwd_inner_microstep: 4652.80 | bwd_allreduce_microstep: 392.44 | step_microstep: 181.23 [2024-07-29 18:55:24,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29593.42 | bwd: 40339.43 | bwd_inner: 39784.86 | bwd_allreduce: 554.09 | step: 181.82 55%|█████▌ | 370/671 [7:12:10<5:48:36, 69.49s/it] {'loss': 1.1559, 'learning_rate': 8.842626887862832e-06, 'epoch': 0.55} 55%|█████▌ | 370/671 [7:12:10<5:48:36, 69.49s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3898 [2024-07-29 18:55:33,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.36 | bwd_microstep: 5342.32 | bwd_inner_microstep: 5279.75 | bwd_allreduce_microstep: 62.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3841 [2024-07-29 18:55:42,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.67 | bwd_microstep: 5168.24 | bwd_inner_microstep: 5099.15 | bwd_allreduce_microstep: 69.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3782 [2024-07-29 18:55:51,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.80 | bwd_microstep: 5236.15 | bwd_inner_microstep: 5174.92 | bwd_allreduce_microstep: 61.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-29 18:55:59,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.38 | bwd_microstep: 4876.48 | bwd_inner_microstep: 4830.44 | bwd_allreduce_microstep: 45.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-29 18:56:08,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.29 | bwd_microstep: 4989.25 | bwd_inner_microstep: 4969.83 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2111 [2024-07-29 18:56:17,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.34 | bwd_microstep: 5207.71 | bwd_inner_microstep: 4801.98 | bwd_allreduce_microstep: 405.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-29 18:56:25,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.16 | bwd_microstep: 5177.78 | bwd_inner_microstep: 4776.26 | bwd_allreduce_microstep: 401.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3649 [2024-07-29 18:56:34,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 18:56:34,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3054.07 | bwd_microstep: 4809.32 | bwd_inner_microstep: 4769.14 | bwd_allreduce_microstep: 40.12 | step_microstep: 181.14 [2024-07-29 18:56:34,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28044.96 | bwd: 40807.22 | bwd_inner: 39701.40 | bwd_allreduce: 1105.35 | step: 181.71 55%|█████▌ | 371/671 [7:13:19<5:46:58, 69.40s/it] {'loss': 1.2513, 'learning_rate': 8.79463319744677e-06, 'epoch': 0.55} 55%|█████▌ | 371/671 [7:13:19<5:46:58, 69.40s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2343 [2024-07-29 18:56:42,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.52 | bwd_microstep: 5268.92 | bwd_inner_microstep: 4862.49 | bwd_allreduce_microstep: 406.36 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3590 [2024-07-29 18:56:51,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3155.37 | bwd_microstep: 5097.05 | bwd_inner_microstep: 5021.11 | bwd_allreduce_microstep: 75.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3803 [2024-07-29 18:57:00,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3802.71 | bwd_microstep: 5057.41 | bwd_inner_microstep: 5034.82 | bwd_allreduce_microstep: 22.52 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2229 [2024-07-29 18:57:08,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.93 | bwd_microstep: 5246.14 | bwd_inner_microstep: 4839.01 | bwd_allreduce_microstep: 407.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-29 18:57:17,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.02 | bwd_microstep: 5183.66 | bwd_inner_microstep: 5128.43 | bwd_allreduce_microstep: 55.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-29 18:57:25,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.22 | bwd_microstep: 4866.30 | bwd_inner_microstep: 4822.24 | bwd_allreduce_microstep: 43.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-29 18:57:34,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.07 | bwd_microstep: 5202.45 | bwd_inner_microstep: 5120.56 | bwd_allreduce_microstep: 81.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-29 18:57:43,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 18:57:43,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.22 | bwd_microstep: 5114.51 | bwd_inner_microstep: 4717.24 | bwd_allreduce_microstep: 397.21 | step_microstep: 181.67 [2024-07-29 18:57:43,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28104.95 | bwd: 41036.43 | bwd_inner: 39545.84 | bwd_allreduce: 1490.13 | step: 182.23 55%|█████▌ | 372/671 [7:14:29<5:45:56, 69.42s/it] {'loss': 1.1638, 'learning_rate': 8.74666766435696e-06, 'epoch': 0.55} 55%|█████▌ | 372/671 [7:14:29<5:45:56, 69.42s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2291 [2024-07-29 18:57:52,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.84 | bwd_microstep: 5240.54 | bwd_inner_microstep: 4835.06 | bwd_allreduce_microstep: 405.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2214 [2024-07-29 18:58:01,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.57 | bwd_microstep: 5280.88 | bwd_inner_microstep: 4871.95 | bwd_allreduce_microstep: 408.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3596 [2024-07-29 18:58:09,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3116.68 | bwd_microstep: 4992.62 | bwd_inner_microstep: 4919.30 | bwd_allreduce_microstep: 73.25 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2184 [2024-07-29 18:58:18,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.53 | bwd_microstep: 5211.80 | bwd_inner_microstep: 4806.16 | bwd_allreduce_microstep: 405.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-29 18:58:26,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3452.10 | bwd_microstep: 5032.40 | bwd_inner_microstep: 4641.08 | bwd_allreduce_microstep: 391.25 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-29 18:58:35,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.78 | bwd_microstep: 4974.65 | bwd_inner_microstep: 4955.26 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-29 18:58:43,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.75 | bwd_microstep: 4892.91 | bwd_inner_microstep: 4873.52 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3703 [2024-07-29 18:58:52,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 18:58:52,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3095.85 | bwd_microstep: 4868.24 | bwd_inner_microstep: 4824.19 | bwd_allreduce_microstep: 43.98 | step_microstep: 182.09 [2024-07-29 18:58:52,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27725.00 | bwd: 40494.03 | bwd_inner: 38726.46 | bwd_allreduce: 1767.09 | step: 182.68 56%|█████▌ | 373/671 [7:15:38<5:43:28, 69.16s/it] {'loss': 1.1504, 'learning_rate': 8.698731409066571e-06, 'epoch': 0.56} 56%|█████▌ | 373/671 [7:15:38<5:43:28, 69.16s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3962 [2024-07-29 18:59:01,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.74 | bwd_microstep: 5681.36 | bwd_inner_microstep: 5622.31 | bwd_allreduce_microstep: 58.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3767 [2024-07-29 18:59:10,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.34 | bwd_microstep: 5223.28 | bwd_inner_microstep: 5159.95 | bwd_allreduce_microstep: 63.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3767 [2024-07-29 18:59:19,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.32 | bwd_microstep: 5000.03 | bwd_inner_microstep: 4980.63 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3639 [2024-07-29 18:59:27,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.24 | bwd_microstep: 5158.88 | bwd_inner_microstep: 5066.81 | bwd_allreduce_microstep: 92.00 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3769 [2024-07-29 18:59:36,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.35 | bwd_microstep: 5025.04 | bwd_inner_microstep: 5000.65 | bwd_allreduce_microstep: 24.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-29 18:59:45,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.49 | bwd_microstep: 5115.63 | bwd_inner_microstep: 4720.34 | bwd_allreduce_microstep: 395.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3674 [2024-07-29 18:59:53,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.88 | bwd_microstep: 4939.19 | bwd_inner_microstep: 4912.48 | bwd_allreduce_microstep: 26.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3702 [2024-07-29 19:00:02,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 19:00:02,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.30 | bwd_microstep: 5027.52 | bwd_inner_microstep: 4951.54 | bwd_allreduce_microstep: 75.91 | step_microstep: 181.57 [2024-07-29 19:00:02,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29226.58 | bwd: 41170.91 | bwd_inner: 40414.65 | bwd_allreduce: 755.78 | step: 182.29 56%|█████▌ | 374/671 [7:16:48<5:44:39, 69.63s/it] {'loss': 1.1642, 'learning_rate': 8.650825551364844e-06, 'epoch': 0.56} 56%|█████▌ | 374/671 [7:16:48<5:44:39, 69.63s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2319 [2024-07-29 19:00:11,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.62 | bwd_microstep: 5164.45 | bwd_inner_microstep: 4763.10 | bwd_allreduce_microstep: 401.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-29 19:00:20,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.58 | bwd_microstep: 5297.43 | bwd_inner_microstep: 5204.87 | bwd_allreduce_microstep: 92.49 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-29 19:00:29,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.66 | bwd_microstep: 5143.67 | bwd_inner_microstep: 5068.22 | bwd_allreduce_microstep: 75.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-29 19:00:38,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.30 | bwd_microstep: 4983.93 | bwd_inner_microstep: 4964.62 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-29 19:00:46,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.00 | bwd_microstep: 5168.08 | bwd_inner_microstep: 5109.95 | bwd_allreduce_microstep: 58.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-29 19:00:55,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.02 | bwd_microstep: 5073.07 | bwd_inner_microstep: 5013.45 | bwd_allreduce_microstep: 59.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-29 19:01:04,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.56 | bwd_microstep: 5176.93 | bwd_inner_microstep: 5102.34 | bwd_allreduce_microstep: 74.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-29 19:01:13,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 19:01:13,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.20 | bwd_microstep: 5065.41 | bwd_inner_microstep: 4671.59 | bwd_allreduce_microstep: 393.75 | step_microstep: 515.45 [2024-07-29 19:01:13,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28825.86 | bwd: 41072.95 | bwd_inner: 39898.09 | bwd_allreduce: 1174.40 | step: 516.04 56%|█████▌ | 375/671 [7:17:59<5:44:52, 69.91s/it] {'loss': 1.2143, 'learning_rate': 8.60295121033094e-06, 'epoch': 0.56} 56%|█████▌ | 375/671 [7:17:59<5:44:52, 69.91s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3546 [2024-07-29 19:01:22,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.04 | bwd_microstep: 5183.23 | bwd_inner_microstep: 5050.43 | bwd_allreduce_microstep: 132.74 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3615 [2024-07-29 19:01:30,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.92 | bwd_microstep: 5218.75 | bwd_inner_microstep: 5120.79 | bwd_allreduce_microstep: 97.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-29 19:01:39,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.02 | bwd_microstep: 5226.40 | bwd_inner_microstep: 4818.02 | bwd_allreduce_microstep: 408.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-29 19:01:47,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3254.51 | bwd_microstep: 4795.15 | bwd_inner_microstep: 4775.72 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3739 [2024-07-29 19:01:56,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.31 | bwd_microstep: 5034.75 | bwd_inner_microstep: 5009.06 | bwd_allreduce_microstep: 25.62 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-29 19:02:05,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.83 | bwd_microstep: 5083.80 | bwd_inner_microstep: 4688.45 | bwd_allreduce_microstep: 395.29 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-29 19:02:13,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3203.79 | bwd_microstep: 4687.24 | bwd_inner_microstep: 4667.93 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3666 [2024-07-29 19:02:21,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-29 19:02:21,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.61 | bwd_microstep: 4886.94 | bwd_inner_microstep: 4867.65 | bwd_allreduce_microstep: 19.22 | step_microstep: 181.08 [2024-07-29 19:02:21,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28153.92 | bwd: 40116.26 | bwd_inner: 38997.99 | bwd_allreduce: 1117.79 | step: 181.64 56%|█████▌ | 376/671 [7:19:07<5:41:46, 69.52s/it] {'loss': 1.167, 'learning_rate': 8.555109504307787e-06, 'epoch': 0.56} 56%|█████▌ | 376/671 [7:19:07<5:41:46, 69.52s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2365 [2024-07-29 19:02:30,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3125.32 | bwd_microstep: 5168.22 | bwd_inner_microstep: 4774.67 | bwd_allreduce_microstep: 393.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2283 [2024-07-29 19:02:39,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.50 | bwd_microstep: 5299.42 | bwd_inner_microstep: 4890.52 | bwd_allreduce_microstep: 408.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3573 [2024-07-29 19:02:47,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.77 | bwd_microstep: 5102.90 | bwd_inner_microstep: 5022.24 | bwd_allreduce_microstep: 80.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-29 19:02:56,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.38 | bwd_microstep: 5157.54 | bwd_inner_microstep: 5080.72 | bwd_allreduce_microstep: 76.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-29 19:03:05,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.67 | bwd_microstep: 5196.80 | bwd_inner_microstep: 5135.96 | bwd_allreduce_microstep: 60.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3800 [2024-07-29 19:03:14,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.25 | bwd_microstep: 5030.94 | bwd_inner_microstep: 5011.65 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-29 19:03:22,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3001.92 | bwd_microstep: 4880.19 | bwd_inner_microstep: 4506.38 | bwd_allreduce_microstep: 373.74 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3680 [2024-07-29 19:03:31,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 19:03:31,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.34 | bwd_microstep: 5077.77 | bwd_inner_microstep: 5004.90 | bwd_allreduce_microstep: 72.81 | step_microstep: 180.92 [2024-07-29 19:03:31,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27851.05 | bwd: 40913.75 | bwd_inner: 39426.98 | bwd_allreduce: 1486.31 | step: 181.50 56%|█████▌ | 377/671 [7:20:16<5:40:00, 69.39s/it] {'loss': 1.0684, 'learning_rate': 8.50730155087596e-06, 'epoch': 0.56} 56%|█████▌ | 377/671 [7:20:16<5:40:00, 69.39s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-29 19:03:38,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3211.48 | bwd_microstep: 4718.66 | bwd_inner_microstep: 4699.48 | bwd_allreduce_microstep: 19.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2302 [2024-07-29 19:03:46,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3003.75 | bwd_microstep: 4934.33 | bwd_inner_microstep: 4555.32 | bwd_allreduce_microstep: 378.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3780 [2024-07-29 19:03:55,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.20 | bwd_microstep: 5019.65 | bwd_inner_microstep: 5000.24 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-29 19:04:04,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.86 | bwd_microstep: 5255.24 | bwd_inner_microstep: 4847.30 | bwd_allreduce_microstep: 407.87 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-29 19:04:12,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.37 | bwd_microstep: 4798.74 | bwd_inner_microstep: 4761.28 | bwd_allreduce_microstep: 37.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3739 [2024-07-29 19:04:21,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.17 | bwd_microstep: 5023.63 | bwd_inner_microstep: 4998.81 | bwd_allreduce_microstep: 24.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-29 19:04:29,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.32 | bwd_microstep: 5055.62 | bwd_inner_microstep: 5001.37 | bwd_allreduce_microstep: 54.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3734 [2024-07-29 19:04:38,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 19:04:38,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.72 | bwd_microstep: 5182.14 | bwd_inner_microstep: 5127.15 | bwd_allreduce_microstep: 54.93 | step_microstep: 181.21 [2024-07-29 19:04:38,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27653.78 | bwd: 39988.00 | bwd_inner: 38990.89 | bwd_allreduce: 996.64 | step: 181.80 56%|█████▋ | 378/671 [7:21:24<5:36:46, 68.96s/it] {'loss': 1.1521, 'learning_rate': 8.459528466827576e-06, 'epoch': 0.56} 56%|█████▋ | 378/671 [7:21:24<5:36:46, 68.96s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3689 [2024-07-29 19:04:47,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3172.55 | bwd_microstep: 5353.41 | bwd_inner_microstep: 5255.35 | bwd_allreduce_microstep: 97.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2301 [2024-07-29 19:04:55,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3063.56 | bwd_microstep: 5028.54 | bwd_inner_microstep: 4637.42 | bwd_allreduce_microstep: 391.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3805 [2024-07-29 19:05:04,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.60 | bwd_microstep: 5047.84 | bwd_inner_microstep: 5025.64 | bwd_allreduce_microstep: 22.13 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-29 19:05:13,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.39 | bwd_microstep: 5196.15 | bwd_inner_microstep: 5141.84 | bwd_allreduce_microstep: 54.25 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2086 [2024-07-29 19:05:21,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3020.08 | bwd_microstep: 4956.09 | bwd_inner_microstep: 4572.97 | bwd_allreduce_microstep: 383.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-29 19:05:30,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.81 | bwd_microstep: 4997.57 | bwd_inner_microstep: 4974.12 | bwd_allreduce_microstep: 23.39 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2127 [2024-07-29 19:05:38,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.61 | bwd_microstep: 5114.97 | bwd_inner_microstep: 4717.75 | bwd_allreduce_microstep: 397.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-29 19:05:46,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 19:05:46,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3018.37 | bwd_microstep: 4928.95 | bwd_inner_microstep: 4550.91 | bwd_allreduce_microstep: 377.98 | step_microstep: 208.21 [2024-07-29 19:05:46,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 26903.89 | bwd: 40623.50 | bwd_inner: 38875.93 | bwd_allreduce: 1747.10 | step: 208.88 56%|█████▋ | 379/671 [7:22:32<5:34:02, 68.64s/it] {'loss': 1.1821, 'learning_rate': 8.411791368140197e-06, 'epoch': 0.56} 56%|█████▋ | 379/671 [7:22:32<5:34:02, 68.64s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2298 [2024-07-29 19:05:55,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.22 | bwd_microstep: 5281.34 | bwd_inner_microstep: 4872.01 | bwd_allreduce_microstep: 409.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3809 [2024-07-29 19:06:04,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.12 | bwd_microstep: 5116.13 | bwd_inner_microstep: 5071.85 | bwd_allreduce_microstep: 44.21 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3746 [2024-07-29 19:06:13,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.34 | bwd_microstep: 5100.86 | bwd_inner_microstep: 5034.52 | bwd_allreduce_microstep: 66.28 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3823 [2024-07-29 19:06:21,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.39 | bwd_microstep: 5155.39 | bwd_inner_microstep: 5102.43 | bwd_allreduce_microstep: 52.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-29 19:06:30,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.05 | bwd_microstep: 5232.79 | bwd_inner_microstep: 4826.73 | bwd_allreduce_microstep: 405.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-29 19:06:39,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.06 | bwd_microstep: 5012.42 | bwd_inner_microstep: 4952.84 | bwd_allreduce_microstep: 59.52 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2147 [2024-07-29 19:06:47,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2995.45 | bwd_microstep: 4872.11 | bwd_inner_microstep: 4494.91 | bwd_allreduce_microstep: 377.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-29 19:06:56,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 19:06:56,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.66 | bwd_microstep: 5051.66 | bwd_inner_microstep: 4993.52 | bwd_allreduce_microstep: 58.05 | step_microstep: 180.61 [2024-07-29 19:06:56,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28191.21 | bwd: 40822.67 | bwd_inner: 39348.75 | bwd_allreduce: 1473.44 | step: 181.20 57%|█████▋ | 380/671 [7:23:42<5:33:54, 68.85s/it] {'loss': 1.1589, 'learning_rate': 8.364091369950783e-06, 'epoch': 0.57} 57%|█████▋ | 380/671 [7:23:42<5:33:54, 68.85s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3925 [2024-07-29 19:07:05,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.64 | bwd_microstep: 5377.90 | bwd_inner_microstep: 5330.11 | bwd_allreduce_microstep: 47.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3550 [2024-07-29 19:07:14,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.71 | bwd_microstep: 5203.09 | bwd_inner_microstep: 5109.75 | bwd_allreduce_microstep: 93.27 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2062 [2024-07-29 19:07:22,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.47 | bwd_microstep: 5284.45 | bwd_inner_microstep: 4875.99 | bwd_allreduce_microstep: 408.39 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3603 [2024-07-29 19:07:31,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.37 | bwd_microstep: 5143.52 | bwd_inner_microstep: 5055.07 | bwd_allreduce_microstep: 88.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-29 19:07:40,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.02 | bwd_microstep: 5027.52 | bwd_inner_microstep: 4976.94 | bwd_allreduce_microstep: 50.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-29 19:07:48,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.34 | bwd_microstep: 4923.48 | bwd_inner_microstep: 4885.09 | bwd_allreduce_microstep: 38.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-29 19:07:57,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.72 | bwd_microstep: 5048.75 | bwd_inner_microstep: 4990.63 | bwd_allreduce_microstep: 58.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3677 [2024-07-29 19:08:06,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-29 19:08:06,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.52 | bwd_microstep: 4916.23 | bwd_inner_microstep: 4892.41 | bwd_allreduce_microstep: 23.76 | step_microstep: 181.01 [2024-07-29 19:08:06,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28788.69 | bwd: 40924.92 | bwd_inner: 40115.93 | bwd_allreduce: 808.52 | step: 181.71 57%|█████▋ | 381/671 [7:24:52<5:34:29, 69.21s/it] {'loss': 1.1123, 'learning_rate': 8.316429586529616e-06, 'epoch': 0.57} 57%|█████▋ | 381/671 [7:24:52<5:34:29, 69.21s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2350 [2024-07-29 19:08:15,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.34 | bwd_microstep: 5415.64 | bwd_inner_microstep: 5001.41 | bwd_allreduce_microstep: 414.17 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2373 [2024-07-29 19:08:23,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3116.51 | bwd_microstep: 5164.86 | bwd_inner_microstep: 4769.24 | bwd_allreduce_microstep: 395.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3808 [2024-07-29 19:08:32,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.88 | bwd_microstep: 5052.66 | bwd_inner_microstep: 5012.65 | bwd_allreduce_microstep: 39.94 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2089 [2024-07-29 19:08:40,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3050.24 | bwd_microstep: 5020.47 | bwd_inner_microstep: 4632.60 | bwd_allreduce_microstep: 387.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-29 19:08:49,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.45 | bwd_microstep: 5190.90 | bwd_inner_microstep: 5129.84 | bwd_allreduce_microstep: 61.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-29 19:08:57,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3191.12 | bwd_microstep: 4759.43 | bwd_inner_microstep: 4726.50 | bwd_allreduce_microstep: 32.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-29 19:09:05,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.49 | bwd_microstep: 5222.61 | bwd_inner_microstep: 4818.80 | bwd_allreduce_microstep: 403.74 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 19:09:14,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 19:09:14,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.37 | bwd_microstep: 4993.34 | bwd_inner_microstep: 4944.31 | bwd_allreduce_microstep: 48.97 | step_microstep: 182.47 [2024-07-29 19:09:14,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27304.30 | bwd: 40819.89 | bwd_inner: 39035.27 | bwd_allreduce: 1784.15 | step: 183.04 57%|█████▋ | 382/671 [7:26:00<5:32:15, 68.98s/it] {'loss': 1.1787, 'learning_rate': 8.268807131254288e-06, 'epoch': 0.57} 57%|█████▋ | 382/671 [7:26:00<5:32:15, 68.98s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2143 [2024-07-29 19:09:23,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.39 | bwd_microstep: 5209.59 | bwd_inner_microstep: 4807.95 | bwd_allreduce_microstep: 401.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3808 [2024-07-29 19:09:32,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.71 | bwd_microstep: 5029.62 | bwd_inner_microstep: 5010.23 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-29 19:09:41,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.82 | bwd_microstep: 5180.90 | bwd_inner_microstep: 5092.50 | bwd_allreduce_microstep: 88.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-29 19:09:49,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3465.14 | bwd_microstep: 5091.48 | bwd_inner_microstep: 4694.83 | bwd_allreduce_microstep: 396.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-29 19:09:58,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.68 | bwd_microstep: 5153.45 | bwd_inner_microstep: 5066.15 | bwd_allreduce_microstep: 87.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-29 19:10:07,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.37 | bwd_microstep: 4985.00 | bwd_inner_microstep: 4965.62 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-29 19:10:15,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.90 | bwd_microstep: 5039.50 | bwd_inner_microstep: 4646.04 | bwd_allreduce_microstep: 393.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-29 19:10:24,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 19:10:24,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.63 | bwd_microstep: 5042.30 | bwd_inner_microstep: 4980.01 | bwd_allreduce_microstep: 62.23 | step_microstep: 181.35 [2024-07-29 19:10:24,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28916.54 | bwd: 40731.81 | bwd_inner: 39263.26 | bwd_allreduce: 1468.07 | step: 181.94 57%|█████▋ | 383/671 [7:27:10<5:32:32, 69.28s/it] {'loss': 1.1102, 'learning_rate': 8.22122511658368e-06, 'epoch': 0.57} 57%|█████▋ | 383/671 [7:27:10<5:32:32, 69.28s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3911 [2024-07-29 19:10:33,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.07 | bwd_microstep: 5316.69 | bwd_inner_microstep: 5259.80 | bwd_allreduce_microstep: 56.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-07-29 19:10:42,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.72 | bwd_microstep: 5007.99 | bwd_inner_microstep: 4988.55 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-29 19:10:51,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.12 | bwd_microstep: 5007.47 | bwd_inner_microstep: 4988.18 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2218 [2024-07-29 19:10:59,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3476.23 | bwd_microstep: 5054.31 | bwd_inner_microstep: 4662.66 | bwd_allreduce_microstep: 391.58 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2086 [2024-07-29 19:11:08,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.71 | bwd_microstep: 5217.82 | bwd_inner_microstep: 4813.48 | bwd_allreduce_microstep: 404.27 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-29 19:11:17,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.76 | bwd_microstep: 5187.32 | bwd_inner_microstep: 4783.11 | bwd_allreduce_microstep: 404.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-29 19:11:25,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3194.90 | bwd_microstep: 4747.60 | bwd_inner_microstep: 4724.45 | bwd_allreduce_microstep: 23.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-29 19:11:34,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.43 [2024-07-29 19:11:34,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.07 | bwd_microstep: 5123.60 | bwd_inner_microstep: 4725.22 | bwd_allreduce_microstep: 398.31 | step_microstep: 182.12 [2024-07-29 19:11:34,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28446.48 | bwd: 40662.77 | bwd_inner: 38945.40 | bwd_allreduce: 1716.91 | step: 182.81 57%|█████▋ | 384/671 [7:28:20<5:31:36, 69.33s/it] {'loss': 1.1509, 'learning_rate': 8.173684654031986e-06, 'epoch': 0.57} 57%|█████▋ | 384/671 [7:28:20<5:31:36, 69.33s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2364 [2024-07-29 19:11:42,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.20 | bwd_microstep: 5290.67 | bwd_inner_microstep: 4884.18 | bwd_allreduce_microstep: 406.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3797 [2024-07-29 19:11:51,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.96 | bwd_microstep: 5041.60 | bwd_inner_microstep: 5022.22 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-29 19:11:59,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3243.09 | bwd_microstep: 4845.79 | bwd_inner_microstep: 4797.27 | bwd_allreduce_microstep: 48.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2224 [2024-07-29 19:12:08,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.94 | bwd_microstep: 5221.81 | bwd_inner_microstep: 4814.18 | bwd_allreduce_microstep: 407.57 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-29 19:12:17,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.60 | bwd_microstep: 4998.78 | bwd_inner_microstep: 4979.43 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 19:12:26,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.18 | bwd_microstep: 5178.94 | bwd_inner_microstep: 5103.26 | bwd_allreduce_microstep: 75.62 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3668 [2024-07-29 19:12:34,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.05 | bwd_microstep: 5018.83 | bwd_inner_microstep: 4948.80 | bwd_allreduce_microstep: 69.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-29 19:12:43,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 19:12:43,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.05 | bwd_microstep: 5026.39 | bwd_inner_microstep: 4972.14 | bwd_allreduce_microstep: 54.19 | step_microstep: 180.82 [2024-07-29 19:12:43,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28598.96 | bwd: 40622.79 | bwd_inner: 39521.41 | bwd_allreduce: 1100.90 | step: 181.41 57%|█████▋ | 385/671 [7:29:29<5:30:46, 69.39s/it] {'loss': 1.1295, 'learning_rate': 8.126186854142754e-06, 'epoch': 0.57} 57%|█████▋ | 385/671 [7:29:29<5:30:46, 69.39s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-29 19:12:52,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.88 | bwd_microstep: 5341.62 | bwd_inner_microstep: 5239.21 | bwd_allreduce_microstep: 102.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2292 [2024-07-29 19:13:01,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.39 | bwd_microstep: 5205.15 | bwd_inner_microstep: 4799.72 | bwd_allreduce_microstep: 405.36 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3828 [2024-07-29 19:13:10,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.53 | bwd_microstep: 5044.21 | bwd_inner_microstep: 5024.85 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3788 [2024-07-29 19:13:19,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3767.57 | bwd_microstep: 5047.09 | bwd_inner_microstep: 5023.79 | bwd_allreduce_microstep: 23.23 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2093 [2024-07-29 19:13:26,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2925.67 | bwd_microstep: 4752.22 | bwd_inner_microstep: 4384.63 | bwd_allreduce_microstep: 367.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-29 19:13:35,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.53 | bwd_microstep: 5190.21 | bwd_inner_microstep: 5130.79 | bwd_allreduce_microstep: 59.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-29 19:13:44,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.73 | bwd_microstep: 5130.30 | bwd_inner_microstep: 4733.04 | bwd_allreduce_microstep: 397.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-29 19:13:53,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 19:13:53,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.54 | bwd_microstep: 5108.92 | bwd_inner_microstep: 4712.16 | bwd_allreduce_microstep: 396.69 | step_microstep: 182.51 [2024-07-29 19:13:53,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28283.75 | bwd: 40819.69 | bwd_inner: 39048.12 | bwd_allreduce: 1771.10 | step: 183.09 58%|█████▊ | 386/671 [7:30:39<5:29:40, 69.41s/it] {'loss': 1.1561, 'learning_rate': 8.078732826462917e-06, 'epoch': 0.57} 58%|█████▊ | 386/671 [7:30:39<5:29:40, 69.41s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-29 19:14:02,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.20 | bwd_microstep: 5299.13 | bwd_inner_microstep: 5210.86 | bwd_allreduce_microstep: 88.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3800 [2024-07-29 19:14:10,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.64 | bwd_microstep: 5033.89 | bwd_inner_microstep: 5014.47 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3598 [2024-07-29 19:14:19,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.08 | bwd_microstep: 5128.22 | bwd_inner_microstep: 5054.73 | bwd_allreduce_microstep: 73.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-29 19:14:28,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.24 | bwd_microstep: 5004.83 | bwd_inner_microstep: 4985.52 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-29 19:14:37,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.73 | bwd_microstep: 5089.75 | bwd_inner_microstep: 5044.07 | bwd_allreduce_microstep: 45.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 19:14:46,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3866.37 | bwd_microstep: 5081.07 | bwd_inner_microstep: 5061.64 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-29 19:14:54,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.34 | bwd_microstep: 5183.51 | bwd_inner_microstep: 5105.56 | bwd_allreduce_microstep: 77.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-29 19:15:03,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 19:15:03,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.00 | bwd_microstep: 5062.05 | bwd_inner_microstep: 5002.41 | bwd_allreduce_microstep: 59.57 | step_microstep: 181.01 [2024-07-29 19:15:03,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29368.52 | bwd: 40882.43 | bwd_inner: 40479.20 | bwd_allreduce: 402.75 | step: 181.57 58%|█████▊ | 387/671 [7:31:49<5:30:11, 69.76s/it] {'loss': 1.1393, 'learning_rate': 8.0313236795169e-06, 'epoch': 0.58} 58%|█████▊ | 387/671 [7:31:49<5:30:11, 69.76s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3884 [2024-07-29 19:15:11,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3291.54 | bwd_microstep: 4951.17 | bwd_inner_microstep: 4932.09 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3577 [2024-07-29 19:15:20,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.96 | bwd_microstep: 5223.50 | bwd_inner_microstep: 5129.51 | bwd_allreduce_microstep: 93.91 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-29 19:15:29,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.89 | bwd_microstep: 5143.83 | bwd_inner_microstep: 4741.76 | bwd_allreduce_microstep: 401.99 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3612 [2024-07-29 19:15:38,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.61 | bwd_microstep: 5094.93 | bwd_inner_microstep: 5004.28 | bwd_allreduce_microstep: 90.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-29 19:15:46,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3253.02 | bwd_microstep: 4856.03 | bwd_inner_microstep: 4828.60 | bwd_allreduce_microstep: 27.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-29 19:15:54,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.65 | bwd_microstep: 5020.40 | bwd_inner_microstep: 4963.50 | bwd_allreduce_microstep: 56.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-29 19:16:03,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.28 | bwd_microstep: 5053.88 | bwd_inner_microstep: 4990.86 | bwd_allreduce_microstep: 62.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-29 19:16:12,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 19:16:12,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.34 | bwd_microstep: 4974.84 | bwd_inner_microstep: 4927.92 | bwd_allreduce_microstep: 46.86 | step_microstep: 181.65 [2024-07-29 19:16:12,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27898.21 | bwd: 40318.55 | bwd_inner: 39518.45 | bwd_allreduce: 799.60 | step: 182.21 58%|█████▊ | 388/671 [7:32:58<5:27:18, 69.39s/it] {'loss': 1.1318, 'learning_rate': 7.983960520780712e-06, 'epoch': 0.58} 58%|█████▊ | 388/671 [7:32:58<5:27:18, 69.39s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-29 19:16:21,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.57 | bwd_microstep: 5271.24 | bwd_inner_microstep: 5191.70 | bwd_allreduce_microstep: 79.46 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3808 [2024-07-29 19:16:29,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.94 | bwd_microstep: 5024.69 | bwd_inner_microstep: 5005.30 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3779 [2024-07-29 19:16:38,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.77 | bwd_microstep: 5208.58 | bwd_inner_microstep: 5152.92 | bwd_allreduce_microstep: 55.60 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-29 19:16:47,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.60 | bwd_microstep: 4978.27 | bwd_inner_microstep: 4958.92 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-29 19:16:56,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.67 | bwd_microstep: 5152.93 | bwd_inner_microstep: 5083.99 | bwd_allreduce_microstep: 68.87 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-29 19:17:04,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3206.57 | bwd_microstep: 4791.30 | bwd_inner_microstep: 4756.58 | bwd_allreduce_microstep: 34.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3678 [2024-07-29 19:17:12,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.74 | bwd_microstep: 4876.35 | bwd_inner_microstep: 4856.96 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3667 [2024-07-29 19:17:21,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-07-29 19:17:21,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.18 | bwd_microstep: 5048.57 | bwd_inner_microstep: 4975.45 | bwd_allreduce_microstep: 73.05 | step_microstep: 181.58 [2024-07-29 19:17:21,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28835.96 | bwd: 40351.89 | bwd_inner: 39981.75 | bwd_allreduce: 369.65 | step: 182.18 58%|█████▊ | 389/671 [7:34:07<5:26:20, 69.43s/it] {'loss': 1.1277, 'learning_rate': 7.936644456656082e-06, 'epoch': 0.58} 58%|█████▊ | 389/671 [7:34:07<5:26:20, 69.43s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3584 [2024-07-29 19:17:30,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.47 | bwd_microstep: 5351.12 | bwd_inner_microstep: 5198.09 | bwd_allreduce_microstep: 152.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3591 [2024-07-29 19:17:39,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.81 | bwd_microstep: 5206.25 | bwd_inner_microstep: 5115.50 | bwd_allreduce_microstep: 90.68 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2299 [2024-07-29 19:17:48,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.22 | bwd_microstep: 5163.58 | bwd_inner_microstep: 4761.22 | bwd_allreduce_microstep: 402.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-29 19:17:57,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.89 | bwd_microstep: 5150.10 | bwd_inner_microstep: 5097.88 | bwd_allreduce_microstep: 52.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3734 [2024-07-29 19:18:05,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.99 | bwd_microstep: 4792.92 | bwd_inner_microstep: 4773.47 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2172 [2024-07-29 19:18:13,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3025.60 | bwd_microstep: 4924.68 | bwd_inner_microstep: 4548.39 | bwd_allreduce_microstep: 376.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-29 19:18:21,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.88 | bwd_microstep: 5028.38 | bwd_inner_microstep: 4990.47 | bwd_allreduce_microstep: 37.85 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-29 19:18:30,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 19:18:30,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.00 | bwd_microstep: 4985.95 | bwd_inner_microstep: 4934.76 | bwd_allreduce_microstep: 51.13 | step_microstep: 180.97 [2024-07-29 19:18:30,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27827.78 | bwd: 40602.96 | bwd_inner: 39419.72 | bwd_allreduce: 1182.77 | step: 181.56 58%|█████▊ | 390/671 [7:35:16<5:24:14, 69.23s/it] {'loss': 1.2151, 'learning_rate': 7.889376592444605e-06, 'epoch': 0.58} 58%|█████▊ | 390/671 [7:35:16<5:24:14, 69.23s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2017 [2024-07-29 19:18:39,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.96 | bwd_microstep: 5379.23 | bwd_inner_microstep: 4963.06 | bwd_allreduce_microstep: 416.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-29 19:18:47,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3205.02 | bwd_microstep: 4794.31 | bwd_inner_microstep: 4753.11 | bwd_allreduce_microstep: 41.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3759 [2024-07-29 19:18:56,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.87 | bwd_microstep: 5189.58 | bwd_inner_microstep: 5129.07 | bwd_allreduce_microstep: 60.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-29 19:19:05,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.55 | bwd_microstep: 4986.30 | bwd_inner_microstep: 4967.00 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-29 19:19:13,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.12 | bwd_microstep: 5103.09 | bwd_inner_microstep: 5032.86 | bwd_allreduce_microstep: 70.17 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-29 19:19:22,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.80 | bwd_microstep: 5054.48 | bwd_inner_microstep: 5024.96 | bwd_allreduce_microstep: 29.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-29 19:19:31,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.80 | bwd_microstep: 5036.79 | bwd_inner_microstep: 4984.22 | bwd_allreduce_microstep: 52.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-29 19:19:40,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.80 [2024-07-29 19:19:40,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.51 | bwd_microstep: 4899.51 | bwd_inner_microstep: 4880.09 | bwd_allreduce_microstep: 19.35 | step_microstep: 180.79 [2024-07-29 19:19:40,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28782.53 | bwd: 40443.27 | bwd_inner: 39734.32 | bwd_allreduce: 708.49 | step: 181.35 58%|█████▊ | 391/671 [7:36:26<5:23:31, 69.33s/it] {'loss': 1.1762, 'learning_rate': 7.84215803232194e-06, 'epoch': 0.58} 58%|█████▊ | 391/671 [7:36:26<5:23:31, 69.33s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3900 [2024-07-29 19:19:49,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.94 | bwd_microstep: 5318.20 | bwd_inner_microstep: 5255.40 | bwd_allreduce_microstep: 62.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-29 19:19:57,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.21 | bwd_microstep: 5134.82 | bwd_inner_microstep: 5059.00 | bwd_allreduce_microstep: 75.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-29 19:20:06,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.90 | bwd_microstep: 5152.15 | bwd_inner_microstep: 5069.36 | bwd_allreduce_microstep: 82.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-29 19:20:15,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.44 | bwd_microstep: 5189.49 | bwd_inner_microstep: 5104.98 | bwd_allreduce_microstep: 84.44 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1192 [2024-07-29 19:20:24,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.97 | bwd_microstep: 5236.14 | bwd_inner_microstep: 4829.94 | bwd_allreduce_microstep: 406.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-29 19:20:32,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3168.84 | bwd_microstep: 4690.91 | bwd_inner_microstep: 4667.02 | bwd_allreduce_microstep: 23.83 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-29 19:20:40,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.56 | bwd_microstep: 5054.17 | bwd_inner_microstep: 4995.91 | bwd_allreduce_microstep: 58.19 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3666 [2024-07-29 19:20:49,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.43 [2024-07-29 19:20:49,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.08 | bwd_microstep: 5053.02 | bwd_inner_microstep: 4979.83 | bwd_allreduce_microstep: 73.12 | step_microstep: 181.01 [2024-07-29 19:20:49,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28391.85 | bwd: 40828.87 | bwd_inner: 39961.39 | bwd_allreduce: 867.02 | step: 181.58 58%|█████▊ | 392/671 [7:37:35<5:22:40, 69.39s/it] {'loss': 1.1384, 'learning_rate': 7.794989879311991e-06, 'epoch': 0.58} 58%|█████▊ | 392/671 [7:37:35<5:22:40, 69.39s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-29 19:20:58,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.17 | bwd_microstep: 5303.69 | bwd_inner_microstep: 5234.74 | bwd_allreduce_microstep: 68.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3823 [2024-07-29 19:21:07,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3788.62 | bwd_microstep: 5060.07 | bwd_inner_microstep: 5040.16 | bwd_allreduce_microstep: 19.83 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2056 [2024-07-29 19:21:16,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.89 | bwd_microstep: 5312.96 | bwd_inner_microstep: 4902.57 | bwd_allreduce_microstep: 410.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3786 [2024-07-29 19:21:25,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.68 | bwd_microstep: 5104.34 | bwd_inner_microstep: 5033.71 | bwd_allreduce_microstep: 70.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-29 19:21:33,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3234.17 | bwd_microstep: 4856.39 | bwd_inner_microstep: 4833.12 | bwd_allreduce_microstep: 23.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-29 19:21:41,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.52 | bwd_microstep: 5105.31 | bwd_inner_microstep: 5037.98 | bwd_allreduce_microstep: 67.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-29 19:21:50,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.02 | bwd_microstep: 5198.66 | bwd_inner_microstep: 5118.23 | bwd_allreduce_microstep: 80.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-29 19:21:59,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 19:21:59,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.70 | bwd_microstep: 5017.51 | bwd_inner_microstep: 4963.93 | bwd_allreduce_microstep: 53.52 | step_microstep: 181.97 [2024-07-29 19:21:59,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28656.68 | bwd: 40958.90 | bwd_inner: 40164.39 | bwd_allreduce: 794.04 | step: 182.55 59%|█████▊ | 393/671 [7:38:45<5:22:17, 69.56s/it] {'loss': 1.0996, 'learning_rate': 7.74787323526116e-06, 'epoch': 0.58} 59%|█████▊ | 393/671 [7:38:45<5:22:17, 69.56s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3554 [2024-07-29 19:22:08,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.02 | bwd_microstep: 5355.26 | bwd_inner_microstep: 5178.62 | bwd_allreduce_microstep: 176.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2310 [2024-07-29 19:22:17,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.61 | bwd_microstep: 5340.40 | bwd_inner_microstep: 4926.54 | bwd_allreduce_microstep: 413.79 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3810 [2024-07-29 19:22:26,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.58 | bwd_microstep: 5191.33 | bwd_inner_microstep: 5119.99 | bwd_allreduce_microstep: 71.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3678 [2024-07-29 19:22:35,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.50 | bwd_microstep: 5146.84 | bwd_inner_microstep: 5058.04 | bwd_allreduce_microstep: 88.73 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2101 [2024-07-29 19:22:43,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.57 | bwd_microstep: 5215.21 | bwd_inner_microstep: 4809.61 | bwd_allreduce_microstep: 405.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-29 19:22:52,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.67 | bwd_microstep: 5100.01 | bwd_inner_microstep: 5054.50 | bwd_allreduce_microstep: 45.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 19:23:01,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.65 | bwd_microstep: 4981.34 | bwd_inner_microstep: 4961.79 | bwd_allreduce_microstep: 19.48 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-07-29 19:23:10,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 19:23:10,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.28 | bwd_microstep: 4885.11 | bwd_inner_microstep: 4865.71 | bwd_allreduce_microstep: 19.33 | step_microstep: 181.11 [2024-07-29 19:23:10,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29105.78 | bwd: 41215.49 | bwd_inner: 39974.74 | bwd_allreduce: 1240.25 | step: 181.70 59%|█████▊ | 394/671 [7:39:56<5:22:38, 69.89s/it] {'loss': 1.1601, 'learning_rate': 7.700809200812598e-06, 'epoch': 0.59} 59%|█████▊ | 394/671 [7:39:56<5:22:38, 69.89s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2397 [2024-07-29 19:23:18,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3061.86 | bwd_microstep: 5023.79 | bwd_inner_microstep: 4636.99 | bwd_allreduce_microstep: 386.74 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2230 [2024-07-29 19:23:26,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3083.90 | bwd_microstep: 5068.75 | bwd_inner_microstep: 4679.28 | bwd_allreduce_microstep: 389.40 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-29 19:23:34,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3075.98 | bwd_microstep: 5045.24 | bwd_inner_microstep: 4656.23 | bwd_allreduce_microstep: 388.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-29 19:23:43,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.35 | bwd_microstep: 5157.03 | bwd_inner_microstep: 5075.04 | bwd_allreduce_microstep: 81.91 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3614 [2024-07-29 19:23:51,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3119.01 | bwd_microstep: 4959.13 | bwd_inner_microstep: 4893.28 | bwd_allreduce_microstep: 65.77 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-29 19:24:00,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.98 | bwd_microstep: 5041.57 | bwd_inner_microstep: 4980.74 | bwd_allreduce_microstep: 60.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 19:24:08,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.00 | bwd_microstep: 5056.71 | bwd_inner_microstep: 5032.17 | bwd_allreduce_microstep: 24.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-29 19:24:17,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 19:24:17,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.52 | bwd_microstep: 5030.65 | bwd_inner_microstep: 4977.32 | bwd_allreduce_microstep: 53.26 | step_microstep: 181.21 [2024-07-29 19:24:17,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 26851.53 | bwd: 40382.85 | bwd_inner: 38931.00 | bwd_allreduce: 1451.36 | step: 181.90 59%|█████▉ | 395/671 [7:41:03<5:18:16, 69.19s/it] {'loss': 1.1501, 'learning_rate': 7.653798875380498e-06, 'epoch': 0.59} 59%|█████▉ | 395/671 [7:41:03<5:18:16, 69.19s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3664 [2024-07-29 19:24:26,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.97 | bwd_microstep: 5387.56 | bwd_inner_microstep: 5267.15 | bwd_allreduce_microstep: 120.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3577 [2024-07-29 19:24:35,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.65 | bwd_microstep: 5021.64 | bwd_inner_microstep: 4950.07 | bwd_allreduce_microstep: 71.50 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2068 [2024-07-29 19:24:43,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.72 | bwd_microstep: 5199.92 | bwd_inner_microstep: 4796.78 | bwd_allreduce_microstep: 403.07 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2196 [2024-07-29 19:24:52,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.63 | bwd_microstep: 5130.99 | bwd_inner_microstep: 4735.17 | bwd_allreduce_microstep: 395.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-29 19:25:01,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.24 | bwd_microstep: 4979.19 | bwd_inner_microstep: 4959.78 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-29 19:25:09,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.83 | bwd_microstep: 5019.23 | bwd_inner_microstep: 4968.39 | bwd_allreduce_microstep: 50.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-29 19:25:18,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.01 | bwd_microstep: 5098.77 | bwd_inner_microstep: 4702.67 | bwd_allreduce_microstep: 396.00 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-29 19:25:27,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 19:25:27,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.37 | bwd_microstep: 5183.59 | bwd_inner_microstep: 5105.77 | bwd_allreduce_microstep: 77.74 | step_microstep: 180.56 [2024-07-29 19:25:27,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28312.33 | bwd: 41020.87 | bwd_inner: 39485.73 | bwd_allreduce: 1534.64 | step: 181.17 59%|█████▉ | 396/671 [7:42:13<5:17:46, 69.33s/it] {'loss': 1.1391, 'learning_rate': 7.6068433571244234e-06, 'epoch': 0.59} 59%|█████▉ | 396/671 [7:42:13<5:17:46, 69.33s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-29 19:25:36,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.37 | bwd_microstep: 5348.88 | bwd_inner_microstep: 5256.10 | bwd_allreduce_microstep: 92.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3817 [2024-07-29 19:25:45,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.76 | bwd_microstep: 5175.49 | bwd_inner_microstep: 5124.89 | bwd_allreduce_microstep: 50.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2205 [2024-07-29 19:25:54,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.38 | bwd_microstep: 5272.15 | bwd_inner_microstep: 4863.70 | bwd_allreduce_microstep: 408.39 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2194 [2024-07-29 19:26:02,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.33 | bwd_microstep: 5195.05 | bwd_inner_microstep: 4790.50 | bwd_allreduce_microstep: 404.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-29 19:26:11,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.86 | bwd_microstep: 4913.57 | bwd_inner_microstep: 4890.68 | bwd_allreduce_microstep: 22.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-29 19:26:20,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.58 | bwd_microstep: 5141.52 | bwd_inner_microstep: 4740.99 | bwd_allreduce_microstep: 400.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-29 19:26:28,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.01 | bwd_microstep: 4825.84 | bwd_inner_microstep: 4806.46 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2209 [2024-07-29 19:26:37,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 19:26:37,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.39 | bwd_microstep: 5232.65 | bwd_inner_microstep: 4827.27 | bwd_allreduce_microstep: 405.31 | step_microstep: 181.79 [2024-07-29 19:26:37,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28407.59 | bwd: 41105.16 | bwd_inner: 39300.54 | bwd_allreduce: 1804.14 | step: 182.36 59%|█████▉ | 397/671 [7:43:23<5:17:18, 69.48s/it] {'loss': 1.1429, 'learning_rate': 7.559943742923626e-06, 'epoch': 0.59} 59%|█████▉ | 397/671 [7:43:23<5:17:18, 69.48s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2438 [2024-07-29 19:26:46,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.53 | bwd_microstep: 5217.77 | bwd_inner_microstep: 4812.09 | bwd_allreduce_microstep: 405.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-29 19:26:54,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.30 | bwd_microstep: 5124.48 | bwd_inner_microstep: 5080.30 | bwd_allreduce_microstep: 44.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-29 19:27:03,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.55 | bwd_microstep: 5159.10 | bwd_inner_microstep: 5105.98 | bwd_allreduce_microstep: 53.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-29 19:27:12,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.47 | bwd_microstep: 5009.51 | bwd_inner_microstep: 4990.16 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-29 19:27:21,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.59 | bwd_microstep: 5089.07 | bwd_inner_microstep: 5020.73 | bwd_allreduce_microstep: 68.27 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-29 19:27:28,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3033.73 | bwd_microstep: 4901.08 | bwd_inner_microstep: 4524.40 | bwd_allreduce_microstep: 376.61 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-29 19:27:37,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.40 | bwd_microstep: 5040.20 | bwd_inner_microstep: 4980.64 | bwd_allreduce_microstep: 59.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-29 19:27:46,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 19:27:46,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.37 | bwd_microstep: 5012.79 | bwd_inner_microstep: 4958.99 | bwd_allreduce_microstep: 53.73 | step_microstep: 185.81 [2024-07-29 19:27:46,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28184.85 | bwd: 40553.98 | bwd_inner: 39473.24 | bwd_allreduce: 1080.26 | step: 186.40 59%|█████▉ | 398/671 [7:44:32<5:15:35, 69.36s/it] {'loss': 1.1895, 'learning_rate': 7.513101128351454e-06, 'epoch': 0.59} 59%|█████▉ | 398/671 [7:44:32<5:15:35, 69.36s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3878 [2024-07-29 19:27:55,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.60 | bwd_microstep: 5343.16 | bwd_inner_microstep: 5263.58 | bwd_allreduce_microstep: 79.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3794 [2024-07-29 19:28:04,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.90 | bwd_microstep: 5164.38 | bwd_inner_microstep: 5093.11 | bwd_allreduce_microstep: 71.21 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3800 [2024-07-29 19:28:12,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.32 | bwd_microstep: 5183.43 | bwd_inner_microstep: 5132.48 | bwd_allreduce_microstep: 50.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3759 [2024-07-29 19:28:21,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3130.02 | bwd_microstep: 4999.95 | bwd_inner_microstep: 4950.72 | bwd_allreduce_microstep: 49.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-29 19:28:29,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.83 | bwd_microstep: 5105.88 | bwd_inner_microstep: 4709.90 | bwd_allreduce_microstep: 395.92 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-29 19:28:38,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3443.96 | bwd_microstep: 5028.35 | bwd_inner_microstep: 4638.55 | bwd_allreduce_microstep: 389.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-29 19:28:46,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.34 | bwd_microstep: 4990.73 | bwd_inner_microstep: 4971.41 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-29 19:28:55,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 19:28:55,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.97 | bwd_microstep: 5050.59 | bwd_inner_microstep: 4992.36 | bwd_allreduce_microstep: 58.17 | step_microstep: 182.72 [2024-07-29 19:28:55,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28221.86 | bwd: 40866.46 | bwd_inner: 39752.05 | bwd_allreduce: 1113.95 | step: 183.31 59%|█████▉ | 399/671 [7:45:41<5:14:30, 69.38s/it] {'loss': 1.171, 'learning_rate': 7.466316607649735e-06, 'epoch': 0.59} 59%|█████▉ | 399/671 [7:45:41<5:14:30, 69.38s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3949 [2024-07-29 19:29:04,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.58 | bwd_microstep: 5111.91 | bwd_inner_microstep: 5084.74 | bwd_allreduce_microstep: 27.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-29 19:29:13,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.40 | bwd_microstep: 5164.79 | bwd_inner_microstep: 5086.37 | bwd_allreduce_microstep: 78.34 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2256 [2024-07-29 19:29:22,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.90 | bwd_microstep: 5251.45 | bwd_inner_microstep: 4845.81 | bwd_allreduce_microstep: 405.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3799 [2024-07-29 19:29:30,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.13 | bwd_microstep: 5155.59 | bwd_inner_microstep: 5107.12 | bwd_allreduce_microstep: 48.40 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2256 [2024-07-29 19:29:39,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.94 | bwd_microstep: 5174.21 | bwd_inner_microstep: 4774.84 | bwd_allreduce_microstep: 399.30 | step_microstep: 0.10 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3712 [2024-07-29 19:29:48,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.69 | bwd_microstep: 5107.22 | bwd_inner_microstep: 5027.67 | bwd_allreduce_microstep: 79.48 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3767 [2024-07-29 19:29:57,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.23 | bwd_microstep: 5023.23 | bwd_inner_microstep: 5003.86 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-29 19:30:05,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 19:30:05,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3466.37 | bwd_microstep: 5055.82 | bwd_inner_microstep: 4662.89 | bwd_allreduce_microstep: 392.87 | step_microstep: 181.11 [2024-07-29 19:30:05,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28766.13 | bwd: 41044.19 | bwd_inner: 39593.24 | bwd_allreduce: 1450.48 | step: 181.69 60%|█████▉ | 400/671 [7:46:51<5:14:23, 69.61s/it] {'loss': 1.1202, 'learning_rate': 7.419591273703245e-06, 'epoch': 0.6} 60%|█████▉ | 400/671 [7:46:51<5:14:23, 69.61s/it][INFO|trainer.py:2936] 2024-07-29 19:30:33,065 >> Saving model checkpoint to /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400 [INFO|configuration_utils.py:473] 2024-07-29 19:30:33,067 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/config.json [INFO|configuration_utils.py:594] 2024-07-29 19:30:33,067 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/generation_config.json [INFO|modeling_utils.py:2501] 2024-07-29 19:31:26,214 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2433] 2024-07-29 19:31:26,216 >> tokenizer config file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-07-29 19:31:26,216 >> Special tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-07-29 19:31:26,216 >> added tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/added_tokens.json [2024-07-29 19:31:26,254] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step400 is about to be saved! [2024-07-29 19:31:29,259] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/global_step400/zero_pp_rank_0_mp_rank_00_model_states.pt [2024-07-29 19:31:29,260] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/global_step400/zero_pp_rank_0_mp_rank_00_model_states.pt... [2024-07-29 19:31:30,514] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/global_step400/zero_pp_rank_0_mp_rank_00_model_states.pt. [2024-07-29 19:31:30,551] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-07-29 19:32:23,182] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-07-29 19:32:23,183] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-07-29 19:32:29,778] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step400 is ready now! dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3859 [2024-07-29 19:32:38,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.49 | bwd_microstep: 5064.05 | bwd_inner_microstep: 5029.29 | bwd_allreduce_microstep: 34.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3584 [2024-07-29 19:32:46,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3187.86 | bwd_microstep: 4814.17 | bwd_inner_microstep: 4766.05 | bwd_allreduce_microstep: 48.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-29 19:32:55,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.45 | bwd_microstep: 5011.27 | bwd_inner_microstep: 4985.35 | bwd_allreduce_microstep: 25.85 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-29 19:33:03,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.71 | bwd_microstep: 5121.00 | bwd_inner_microstep: 4723.59 | bwd_allreduce_microstep: 397.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-29 19:33:12,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.34 | bwd_microstep: 5035.27 | bwd_inner_microstep: 4979.47 | bwd_allreduce_microstep: 55.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3651 [2024-07-29 19:33:21,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.63 | bwd_microstep: 5148.54 | bwd_inner_microstep: 5081.91 | bwd_allreduce_microstep: 66.57 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3684 [2024-07-29 19:33:29,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3129.73 | bwd_microstep: 4706.73 | bwd_inner_microstep: 4680.20 | bwd_allreduce_microstep: 26.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-29 19:33:37,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 19:33:37,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.22 | bwd_microstep: 5026.56 | bwd_inner_microstep: 4972.88 | bwd_allreduce_microstep: 53.60 | step_microstep: 181.12 [2024-07-29 19:33:37,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27851.34 | bwd: 39927.56 | bwd_inner: 39218.67 | bwd_allreduce: 708.41 | step: 181.70 60%|█████▉ | 401/671 [7:50:23<8:25:27, 112.33s/it] {'loss': 1.1702, 'learning_rate': 7.372926218014131e-06, 'epoch': 0.6} 60%|█████▉ | 401/671 [7:50:23<8:25:27, 112.33s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2341 [2024-07-29 19:33:46,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.00 | bwd_microstep: 5334.34 | bwd_inner_microstep: 4924.43 | bwd_allreduce_microstep: 409.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3839 [2024-07-29 19:33:55,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.95 | bwd_microstep: 5036.82 | bwd_inner_microstep: 5017.43 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3744 [2024-07-29 19:34:04,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.06 | bwd_microstep: 5090.02 | bwd_inner_microstep: 5021.62 | bwd_allreduce_microstep: 68.34 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-29 19:34:13,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.17 | bwd_microstep: 5105.14 | bwd_inner_microstep: 5057.96 | bwd_allreduce_microstep: 47.12 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3631 [2024-07-29 19:34:21,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.95 | bwd_microstep: 5144.47 | bwd_inner_microstep: 5057.75 | bwd_allreduce_microstep: 86.62 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-29 19:34:30,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.83 | bwd_microstep: 5041.16 | bwd_inner_microstep: 4981.56 | bwd_allreduce_microstep: 59.53 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-29 19:34:39,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.57 | bwd_microstep: 4905.87 | bwd_inner_microstep: 4884.13 | bwd_allreduce_microstep: 21.67 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2129 [2024-07-29 19:34:47,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 19:34:47,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.13 | bwd_microstep: 5081.82 | bwd_inner_microstep: 4687.88 | bwd_allreduce_microstep: 393.87 | step_microstep: 180.51 [2024-07-29 19:34:47,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28865.59 | bwd: 40739.62 | bwd_inner: 39632.70 | bwd_allreduce: 1106.44 | step: 181.20 60%|█████▉ | 402/671 [7:51:33<7:26:34, 99.61s/it] {'loss': 1.1175, 'learning_rate': 7.326322530676471e-06, 'epoch': 0.6} 60%|█████▉ | 402/671 [7:51:33<7:26:34, 99.61s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3955 [2024-07-29 19:34:56,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3798.65 | bwd_microstep: 5186.57 | bwd_inner_microstep: 5167.51 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2280 [2024-07-29 19:35:05,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.62 | bwd_microstep: 5351.84 | bwd_inner_microstep: 4938.16 | bwd_allreduce_microstep: 413.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2231 [2024-07-29 19:35:14,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.20 | bwd_microstep: 5191.48 | bwd_inner_microstep: 4787.17 | bwd_allreduce_microstep: 404.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3759 [2024-07-29 19:35:22,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3439.00 | bwd_microstep: 4865.99 | bwd_inner_microstep: 4842.92 | bwd_allreduce_microstep: 23.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-29 19:35:31,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.59 | bwd_microstep: 5091.83 | bwd_inner_microstep: 5047.07 | bwd_allreduce_microstep: 44.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-29 19:35:40,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.98 | bwd_microstep: 5091.42 | bwd_inner_microstep: 5030.34 | bwd_allreduce_microstep: 61.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3701 [2024-07-29 19:35:48,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.31 | bwd_microstep: 5053.43 | bwd_inner_microstep: 4982.05 | bwd_allreduce_microstep: 71.32 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2135 [2024-07-29 19:35:57,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-29 19:35:57,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.80 | bwd_microstep: 5052.62 | bwd_inner_microstep: 4662.70 | bwd_allreduce_microstep: 389.86 | step_microstep: 182.28 [2024-07-29 19:35:57,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28595.06 | bwd: 40885.18 | bwd_inner: 39457.87 | bwd_allreduce: 1426.84 | step: 182.97 60%|██████ | 403/671 [7:52:43<6:44:59, 90.67s/it] {'loss': 1.1549, 'learning_rate': 7.27978130035076e-06, 'epoch': 0.6} 60%|██████ | 403/671 [7:52:43<6:44:59, 90.67s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3542 [2024-07-29 19:36:05,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3137.21 | bwd_microstep: 5072.84 | bwd_inner_microstep: 4980.28 | bwd_allreduce_microstep: 92.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2238 [2024-07-29 19:36:14,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.88 | bwd_microstep: 5145.59 | bwd_inner_microstep: 4745.38 | bwd_allreduce_microstep: 400.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3764 [2024-07-29 19:36:22,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3221.61 | bwd_microstep: 4813.72 | bwd_inner_microstep: 4794.33 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3624 [2024-07-29 19:36:31,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.50 | bwd_microstep: 5197.32 | bwd_inner_microstep: 5092.47 | bwd_allreduce_microstep: 104.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-29 19:36:40,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.04 | bwd_microstep: 5216.15 | bwd_inner_microstep: 4811.91 | bwd_allreduce_microstep: 404.17 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-29 19:36:48,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.41 | bwd_microstep: 5043.94 | bwd_inner_microstep: 5001.08 | bwd_allreduce_microstep: 42.80 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-29 19:36:57,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.67 | bwd_microstep: 5102.20 | bwd_inner_microstep: 4705.67 | bwd_allreduce_microstep: 396.47 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-29 19:37:06,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 19:37:06,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.55 | bwd_microstep: 5106.76 | bwd_inner_microstep: 5040.44 | bwd_allreduce_microstep: 66.25 | step_microstep: 181.17 [2024-07-29 19:37:06,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27717.79 | bwd: 40698.50 | bwd_inner: 39171.49 | bwd_allreduce: 1526.53 | step: 181.85 60%|██████ | 404/671 [7:53:52<6:14:12, 84.09s/it] {'loss': 1.167, 'learning_rate': 7.233303614238511e-06, 'epoch': 0.6} 60%|██████ | 404/671 [7:53:52<6:14:12, 84.09s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3544 [2024-07-29 19:37:15,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.40 | bwd_microstep: 5230.25 | bwd_inner_microstep: 5136.24 | bwd_allreduce_microstep: 93.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3569 [2024-07-29 19:37:23,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3181.48 | bwd_microstep: 4816.72 | bwd_inner_microstep: 4768.61 | bwd_allreduce_microstep: 48.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-07-29 19:37:32,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.51 | bwd_microstep: 5003.66 | bwd_inner_microstep: 4984.29 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-29 19:37:40,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.80 | bwd_microstep: 5065.86 | bwd_inner_microstep: 5039.51 | bwd_allreduce_microstep: 26.29 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-29 19:37:48,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3202.24 | bwd_microstep: 4805.06 | bwd_inner_microstep: 4766.93 | bwd_allreduce_microstep: 38.06 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-29 19:37:57,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.46 | bwd_microstep: 5223.71 | bwd_inner_microstep: 4818.16 | bwd_allreduce_microstep: 405.49 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-29 19:38:06,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.64 | bwd_microstep: 5032.19 | bwd_inner_microstep: 5007.84 | bwd_allreduce_microstep: 24.29 | step_microstep: 0.20 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-29 19:38:15,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 19:38:15,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.56 | bwd_microstep: 4956.06 | bwd_inner_microstep: 4911.21 | bwd_allreduce_microstep: 44.77 | step_microstep: 208.81 [2024-07-29 19:38:15,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28366.99 | bwd: 40133.50 | bwd_inner: 39432.72 | bwd_allreduce: 700.30 | step: 209.52 60%|██████ | 405/671 [7:55:01<5:52:33, 79.52s/it] {'loss': 1.1407, 'learning_rate': 7.186890558056836e-06, 'epoch': 0.6} 60%|██████ | 405/671 [7:55:01<5:52:33, 79.52s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3945 [2024-07-29 19:38:24,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3816.24 | bwd_microstep: 5193.44 | bwd_inner_microstep: 5174.27 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-29 19:38:33,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.53 | bwd_microstep: 5087.55 | bwd_inner_microstep: 5059.12 | bwd_allreduce_microstep: 28.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3796 [2024-07-29 19:38:41,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.10 | bwd_microstep: 5030.12 | bwd_inner_microstep: 5010.42 | bwd_allreduce_microstep: 19.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-29 19:38:50,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.45 | bwd_microstep: 4999.56 | bwd_inner_microstep: 4980.14 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-29 19:38:59,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.27 | bwd_microstep: 5061.92 | bwd_inner_microstep: 5034.89 | bwd_allreduce_microstep: 26.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2199 [2024-07-29 19:39:08,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.84 | bwd_microstep: 5106.45 | bwd_inner_microstep: 4709.89 | bwd_allreduce_microstep: 396.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-29 19:39:16,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.41 | bwd_microstep: 5037.05 | bwd_inner_microstep: 4995.98 | bwd_allreduce_microstep: 41.00 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-29 19:39:25,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 19:39:25,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.15 | bwd_microstep: 5024.08 | bwd_inner_microstep: 4966.18 | bwd_allreduce_microstep: 57.83 | step_microstep: 182.24 [2024-07-29 19:39:25,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29648.88 | bwd: 40540.16 | bwd_inner: 39930.84 | bwd_allreduce: 608.84 | step: 182.83 61%|██████ | 406/671 [7:56:11<5:39:18, 76.83s/it] {'loss': 1.1076, 'learning_rate': 7.1405432160131076e-06, 'epoch': 0.6} 61%|██████ | 406/671 [7:56:11<5:39:18, 76.83s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3945 [2024-07-29 19:39:34,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.04 | bwd_microstep: 5227.57 | bwd_inner_microstep: 5187.69 | bwd_allreduce_microstep: 39.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3784 [2024-07-29 19:39:43,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.52 | bwd_microstep: 5166.87 | bwd_inner_microstep: 5121.12 | bwd_allreduce_microstep: 45.69 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-29 19:39:51,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3242.71 | bwd_microstep: 4848.46 | bwd_inner_microstep: 4828.58 | bwd_allreduce_microstep: 19.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2143 [2024-07-29 19:40:00,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.46 | bwd_microstep: 5192.66 | bwd_inner_microstep: 4786.51 | bwd_allreduce_microstep: 406.08 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2111 [2024-07-29 19:40:09,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.37 | bwd_microstep: 5205.96 | bwd_inner_microstep: 4800.25 | bwd_allreduce_microstep: 405.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3677 [2024-07-29 19:40:17,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3808.18 | bwd_microstep: 5005.25 | bwd_inner_microstep: 4964.63 | bwd_allreduce_microstep: 40.56 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-29 19:40:26,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.95 | bwd_microstep: 5017.15 | bwd_inner_microstep: 4977.35 | bwd_allreduce_microstep: 39.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-29 19:40:35,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.44 [2024-07-29 19:40:35,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.44 | bwd_microstep: 5329.13 | bwd_inner_microstep: 5152.87 | bwd_allreduce_microstep: 176.20 | step_microstep: 180.78 [2024-07-29 19:40:35,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28446.56 | bwd: 40993.03 | bwd_inner: 39818.95 | bwd_allreduce: 1173.62 | step: 181.46 61%|██████ | 407/671 [7:57:21<5:28:43, 74.71s/it] {'loss': 1.1575, 'learning_rate': 7.0942626707796094e-06, 'epoch': 0.61} 61%|██████ | 407/671 [7:57:21<5:28:43, 74.71s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2045 [2024-07-29 19:40:44,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.60 | bwd_microstep: 5226.72 | bwd_inner_microstep: 4824.28 | bwd_allreduce_microstep: 402.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3786 [2024-07-29 19:40:53,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.65 | bwd_microstep: 5201.32 | bwd_inner_microstep: 5148.12 | bwd_allreduce_microstep: 53.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-29 19:41:01,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.29 | bwd_microstep: 5132.80 | bwd_inner_microstep: 5053.13 | bwd_allreduce_microstep: 79.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3763 [2024-07-29 19:41:10,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.21 | bwd_microstep: 5161.18 | bwd_inner_microstep: 5107.83 | bwd_allreduce_microstep: 53.28 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-29 19:41:19,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.63 | bwd_microstep: 5177.07 | bwd_inner_microstep: 5103.08 | bwd_allreduce_microstep: 73.93 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2153 [2024-07-29 19:41:28,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.58 | bwd_microstep: 5172.16 | bwd_inner_microstep: 4769.59 | bwd_allreduce_microstep: 402.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-29 19:41:36,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.22 | bwd_microstep: 5102.73 | bwd_inner_microstep: 5036.20 | bwd_allreduce_microstep: 66.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-29 19:41:45,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 19:41:45,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3228.96 | bwd_microstep: 4710.26 | bwd_inner_microstep: 4688.37 | bwd_allreduce_microstep: 21.83 | step_microstep: 180.93 [2024-07-29 19:41:45,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28283.05 | bwd: 40884.22 | bwd_inner: 39730.54 | bwd_allreduce: 1153.20 | step: 181.51 61%|██████ | 408/671 [7:58:31<5:20:37, 73.14s/it] {'loss': 1.1732, 'learning_rate': 7.048050003468252e-06, 'epoch': 0.61} 61%|██████ | 408/671 [7:58:31<5:20:37, 73.14s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3901 [2024-07-29 19:41:53,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.50 | bwd_microstep: 5210.13 | bwd_inner_microstep: 5149.21 | bwd_allreduce_microstep: 60.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2270 [2024-07-29 19:42:02,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.85 | bwd_microstep: 5189.60 | bwd_inner_microstep: 4784.99 | bwd_allreduce_microstep: 404.55 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3826 [2024-07-29 19:42:11,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.43 | bwd_microstep: 5051.84 | bwd_inner_microstep: 5032.57 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3725 [2024-07-29 19:42:20,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.66 | bwd_microstep: 5176.04 | bwd_inner_microstep: 5118.76 | bwd_allreduce_microstep: 57.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-29 19:42:28,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.67 | bwd_microstep: 5098.41 | bwd_inner_microstep: 5055.18 | bwd_allreduce_microstep: 43.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-07-29 19:42:37,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.76 | bwd_microstep: 5157.37 | bwd_inner_microstep: 5102.52 | bwd_allreduce_microstep: 54.79 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3718 [2024-07-29 19:42:46,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.43 | bwd_microstep: 5171.96 | bwd_inner_microstep: 5100.15 | bwd_allreduce_microstep: 71.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-29 19:42:55,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.43 [2024-07-29 19:42:55,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.54 | bwd_microstep: 5058.34 | bwd_inner_microstep: 4994.32 | bwd_allreduce_microstep: 63.95 | step_microstep: 181.61 [2024-07-29 19:42:55,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28935.76 | bwd: 41113.69 | bwd_inner: 40337.64 | bwd_allreduce: 775.57 | step: 182.18 61%|██████ | 409/671 [7:59:41<5:15:46, 72.31s/it] {'loss': 1.1869, 'learning_rate': 7.001906293605329e-06, 'epoch': 0.61} 61%|██████ | 409/671 [7:59:41<5:15:46, 72.31s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4019 [2024-07-29 19:43:04,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.66 | bwd_microstep: 5085.09 | bwd_inner_microstep: 5065.98 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3846 [2024-07-29 19:43:13,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.09 | bwd_microstep: 5285.27 | bwd_inner_microstep: 5200.00 | bwd_allreduce_microstep: 85.21 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2278 [2024-07-29 19:43:21,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.28 | bwd_microstep: 5197.90 | bwd_inner_microstep: 4792.45 | bwd_allreduce_microstep: 405.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-29 19:43:30,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.80 | bwd_microstep: 5188.67 | bwd_inner_microstep: 5109.25 | bwd_allreduce_microstep: 79.35 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-29 19:43:39,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.77 | bwd_microstep: 5057.16 | bwd_inner_microstep: 4665.27 | bwd_allreduce_microstep: 391.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-29 19:43:48,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.31 | bwd_microstep: 5139.50 | bwd_inner_microstep: 4739.81 | bwd_allreduce_microstep: 399.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 19:43:56,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.87 | bwd_microstep: 5157.11 | bwd_inner_microstep: 5085.58 | bwd_allreduce_microstep: 71.46 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-29 19:44:05,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-29 19:44:05,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.46 | bwd_microstep: 4987.84 | bwd_inner_microstep: 4968.50 | bwd_allreduce_microstep: 19.26 | step_microstep: 181.13 [2024-07-29 19:44:05,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28884.11 | bwd: 41098.52 | bwd_inner: 39626.78 | bwd_allreduce: 1471.24 | step: 181.82 61%|██████ | 410/671 [8:00:51<5:11:57, 71.72s/it] {'loss': 1.1875, 'learning_rate': 6.9558326191062775e-06, 'epoch': 0.61} 61%|██████ | 410/671 [8:00:51<5:11:57, 71.72s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3857 [2024-07-29 19:44:14,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.01 | bwd_microstep: 5221.84 | bwd_inner_microstep: 5157.22 | bwd_allreduce_microstep: 64.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-29 19:44:23,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.62 | bwd_microstep: 5043.28 | bwd_inner_microstep: 5015.20 | bwd_allreduce_microstep: 28.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2241 [2024-07-29 19:44:32,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.88 | bwd_microstep: 5222.42 | bwd_inner_microstep: 4813.07 | bwd_allreduce_microstep: 409.29 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-29 19:44:40,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.44 | bwd_microstep: 5089.68 | bwd_inner_microstep: 4696.20 | bwd_allreduce_microstep: 393.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-29 19:44:49,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3469.65 | bwd_microstep: 5047.59 | bwd_inner_microstep: 4656.49 | bwd_allreduce_microstep: 391.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-29 19:44:58,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.21 | bwd_microstep: 5099.78 | bwd_inner_microstep: 4704.52 | bwd_allreduce_microstep: 395.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3670 [2024-07-29 19:45:06,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.56 | bwd_microstep: 5166.56 | bwd_inner_microstep: 5077.05 | bwd_allreduce_microstep: 89.45 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2160 [2024-07-29 19:45:15,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 19:45:15,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.31 | bwd_microstep: 5124.67 | bwd_inner_microstep: 4728.19 | bwd_allreduce_microstep: 396.41 | step_microstep: 182.29 [2024-07-29 19:45:15,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28553.58 | bwd: 41015.80 | bwd_inner: 38847.88 | bwd_allreduce: 2167.46 | step: 182.88 61%|██████▏ | 411/671 [8:02:01<5:08:24, 71.17s/it] {'loss': 1.1501, 'learning_rate': 6.909830056250527e-06, 'epoch': 0.61} 61%|██████▏ | 411/671 [8:02:01<5:08:24, 71.17s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2373 [2024-07-29 19:45:24,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.70 | bwd_microstep: 5224.88 | bwd_inner_microstep: 4822.32 | bwd_allreduce_microstep: 402.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3870 [2024-07-29 19:45:33,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.32 | bwd_microstep: 5234.23 | bwd_inner_microstep: 5185.53 | bwd_allreduce_microstep: 48.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-29 19:45:41,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3241.85 | bwd_microstep: 4849.19 | bwd_inner_microstep: 4821.60 | bwd_allreduce_microstep: 27.52 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2240 [2024-07-29 19:45:50,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.54 | bwd_microstep: 5199.04 | bwd_inner_microstep: 4796.15 | bwd_allreduce_microstep: 402.83 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-29 19:45:59,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.46 | bwd_microstep: 5161.67 | bwd_inner_microstep: 5105.12 | bwd_allreduce_microstep: 56.48 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-29 19:46:07,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.35 | bwd_microstep: 5049.88 | bwd_inner_microstep: 5006.70 | bwd_allreduce_microstep: 43.11 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3674 [2024-07-29 19:46:16,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.85 | bwd_microstep: 5053.74 | bwd_inner_microstep: 4975.26 | bwd_allreduce_microstep: 78.42 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-29 19:46:25,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 19:46:25,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3491.87 | bwd_microstep: 5058.79 | bwd_inner_microstep: 4666.90 | bwd_allreduce_microstep: 391.83 | step_microstep: 180.89 [2024-07-29 19:46:25,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28278.83 | bwd: 40831.41 | bwd_inner: 39379.52 | bwd_allreduce: 1451.41 | step: 181.49 61%|██████▏ | 412/671 [8:03:11<5:04:58, 70.65s/it] {'loss': 1.1729, 'learning_rate': 6.8638996796563275e-06, 'epoch': 0.61} 61%|██████▏ | 412/671 [8:03:11<5:04:58, 70.65s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2362 [2024-07-29 19:46:33,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.94 | bwd_microstep: 5233.49 | bwd_inner_microstep: 4826.46 | bwd_allreduce_microstep: 406.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3798 [2024-07-29 19:46:42,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.14 | bwd_microstep: 5080.76 | bwd_inner_microstep: 5013.29 | bwd_allreduce_microstep: 67.41 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-29 19:46:51,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.87 | bwd_microstep: 5039.42 | bwd_inner_microstep: 5019.08 | bwd_allreduce_microstep: 20.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-29 19:47:00,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.62 | bwd_microstep: 4989.89 | bwd_inner_microstep: 4970.53 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-29 19:47:08,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.33 | bwd_microstep: 5038.10 | bwd_inner_microstep: 5016.77 | bwd_allreduce_microstep: 21.25 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2176 [2024-07-29 19:47:17,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.51 | bwd_microstep: 5228.73 | bwd_inner_microstep: 4821.92 | bwd_allreduce_microstep: 406.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-29 19:47:25,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2999.46 | bwd_microstep: 4868.16 | bwd_inner_microstep: 4494.24 | bwd_allreduce_microstep: 373.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-29 19:47:34,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.95 [2024-07-29 19:47:34,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.73 | bwd_microstep: 5080.13 | bwd_inner_microstep: 4684.49 | bwd_allreduce_microstep: 395.57 | step_microstep: 182.58 [2024-07-29 19:47:34,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28410.49 | bwd: 40558.66 | bwd_inner: 38846.72 | bwd_allreduce: 1711.43 | step: 183.27 62%|██████▏ | 413/671 [8:04:20<5:02:03, 70.25s/it] {'loss': 1.1432, 'learning_rate': 6.81804256225567e-06, 'epoch': 0.61} 62%|██████▏ | 413/671 [8:04:20<5:02:03, 70.25s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3818 [2024-07-29 19:47:43,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.81 | bwd_microstep: 5338.96 | bwd_inner_microstep: 5271.56 | bwd_allreduce_microstep: 67.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3794 [2024-07-29 19:47:52,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.23 | bwd_microstep: 5171.71 | bwd_inner_microstep: 5119.34 | bwd_allreduce_microstep: 52.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3776 [2024-07-29 19:48:01,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.42 | bwd_microstep: 5013.38 | bwd_inner_microstep: 4992.18 | bwd_allreduce_microstep: 21.15 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2140 [2024-07-29 19:48:09,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.64 | bwd_microstep: 5187.83 | bwd_inner_microstep: 4785.26 | bwd_allreduce_microstep: 402.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-29 19:48:17,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3231.60 | bwd_microstep: 4829.63 | bwd_inner_microstep: 4805.39 | bwd_allreduce_microstep: 24.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-29 19:48:26,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.58 | bwd_microstep: 5113.30 | bwd_inner_microstep: 4717.17 | bwd_allreduce_microstep: 396.06 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2129 [2024-07-29 19:48:35,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.15 | bwd_microstep: 5099.38 | bwd_inner_microstep: 4701.85 | bwd_allreduce_microstep: 397.46 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-29 19:48:43,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.43 [2024-07-29 19:48:43,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3022.80 | bwd_microstep: 4900.91 | bwd_inner_microstep: 4524.94 | bwd_allreduce_microstep: 375.90 | step_microstep: 180.68 [2024-07-29 19:48:43,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27868.13 | bwd: 40655.09 | bwd_inner: 38917.63 | bwd_allreduce: 1736.99 | step: 181.25 62%|██████▏ | 414/671 [8:05:29<4:59:05, 69.83s/it] {'loss': 1.1358, 'learning_rate': 6.7722597752692055e-06, 'epoch': 0.62} 62%|██████▏ | 414/671 [8:05:29<4:59:05, 69.83s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3932 [2024-07-29 19:48:52,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3832.45 | bwd_microstep: 5176.03 | bwd_inner_microstep: 5154.15 | bwd_allreduce_microstep: 21.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3799 [2024-07-29 19:49:00,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.15 | bwd_microstep: 5096.37 | bwd_inner_microstep: 5057.83 | bwd_allreduce_microstep: 38.48 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-29 19:49:09,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.70 | bwd_microstep: 5120.00 | bwd_inner_microstep: 5067.77 | bwd_allreduce_microstep: 52.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3597 [2024-07-29 19:49:17,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3163.50 | bwd_microstep: 4684.39 | bwd_inner_microstep: 4659.25 | bwd_allreduce_microstep: 25.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-29 19:49:25,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.95 | bwd_microstep: 4844.95 | bwd_inner_microstep: 4797.34 | bwd_allreduce_microstep: 47.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-29 19:49:34,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.56 | bwd_microstep: 5063.47 | bwd_inner_microstep: 5005.47 | bwd_allreduce_microstep: 57.93 | step_microstep: 0.19 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2095 [2024-07-29 19:49:42,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.21 | bwd_microstep: 5096.94 | bwd_inner_microstep: 4699.30 | bwd_allreduce_microstep: 397.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 19:49:51,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 19:49:51,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3206.57 | bwd_microstep: 4716.77 | bwd_inner_microstep: 4691.55 | bwd_allreduce_microstep: 25.16 | step_microstep: 181.75 [2024-07-29 19:49:51,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27692.00 | bwd: 39798.91 | bwd_inner: 39132.59 | bwd_allreduce: 665.84 | step: 182.45 62%|██████▏ | 415/671 [8:06:37<4:55:21, 69.22s/it] {'loss': 1.143, 'learning_rate': 6.726552388181235e-06, 'epoch': 0.62} 62%|██████▏ | 415/671 [8:06:37<4:55:21, 69.22s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3905 [2024-07-29 19:49:59,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3281.23 | bwd_microstep: 4962.08 | bwd_inner_microstep: 4942.94 | bwd_allreduce_microstep: 19.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-29 19:50:08,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.81 | bwd_microstep: 5152.97 | bwd_inner_microstep: 5109.81 | bwd_allreduce_microstep: 43.09 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3748 [2024-07-29 19:50:16,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.45 | bwd_microstep: 5177.24 | bwd_inner_microstep: 5122.52 | bwd_allreduce_microstep: 54.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3636 [2024-07-29 19:50:25,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.27 | bwd_microstep: 5109.25 | bwd_inner_microstep: 5023.63 | bwd_allreduce_microstep: 85.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-29 19:50:33,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3197.43 | bwd_microstep: 4732.93 | bwd_inner_microstep: 4706.36 | bwd_allreduce_microstep: 26.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3637 [2024-07-29 19:50:42,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.09 | bwd_microstep: 5134.27 | bwd_inner_microstep: 5033.83 | bwd_allreduce_microstep: 100.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-29 19:50:50,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.19 | bwd_microstep: 5005.64 | bwd_inner_microstep: 4958.53 | bwd_allreduce_microstep: 47.04 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2123 [2024-07-29 19:50:59,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 19:50:59,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.82 | bwd_microstep: 5168.08 | bwd_inner_microstep: 4765.81 | bwd_allreduce_microstep: 402.20 | step_microstep: 180.51 [2024-07-29 19:50:59,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27922.14 | bwd: 40442.43 | bwd_inner: 39663.38 | bwd_allreduce: 778.58 | step: 181.09 62%|██████▏ | 416/671 [8:07:45<4:53:31, 69.07s/it] {'loss': 1.1349, 'learning_rate': 6.6809214687147165e-06, 'epoch': 0.62} 62%|██████▏ | 416/671 [8:07:45<4:53:31, 69.07s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2462 [2024-07-29 19:51:08,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.22 | bwd_microstep: 5289.60 | bwd_inner_microstep: 4884.10 | bwd_allreduce_microstep: 405.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2290 [2024-07-29 19:51:17,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.71 | bwd_microstep: 5240.89 | bwd_inner_microstep: 4835.83 | bwd_allreduce_microstep: 404.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-29 19:51:26,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.20 | bwd_microstep: 5211.93 | bwd_inner_microstep: 5124.14 | bwd_allreduce_microstep: 87.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3763 [2024-07-29 19:51:34,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.68 | bwd_microstep: 4941.90 | bwd_inner_microstep: 4913.90 | bwd_allreduce_microstep: 27.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-29 19:51:43,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.41 | bwd_microstep: 5055.40 | bwd_inner_microstep: 4995.36 | bwd_allreduce_microstep: 59.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-29 19:51:52,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.23 | bwd_microstep: 5021.78 | bwd_inner_microstep: 5001.54 | bwd_allreduce_microstep: 20.18 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-29 19:52:00,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.37 | bwd_microstep: 5054.84 | bwd_inner_microstep: 4663.00 | bwd_allreduce_microstep: 391.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-29 19:52:09,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 19:52:09,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.01 | bwd_microstep: 5200.73 | bwd_inner_microstep: 4796.11 | bwd_allreduce_microstep: 404.56 | step_microstep: 180.89 [2024-07-29 19:52:09,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28737.73 | bwd: 41017.05 | bwd_inner: 39213.91 | bwd_allreduce: 1802.66 | step: 181.48 62%|██████▏ | 417/671 [8:08:55<4:53:40, 69.37s/it] {'loss': 1.1159, 'learning_rate': 6.6353680828063306e-06, 'epoch': 0.62} 62%|██████▏ | 417/671 [8:08:55<4:53:40, 69.37s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-29 19:52:18,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.01 | bwd_microstep: 5441.95 | bwd_inner_microstep: 5349.76 | bwd_allreduce_microstep: 92.13 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3784 [2024-07-29 19:52:27,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.44 | bwd_microstep: 5139.81 | bwd_inner_microstep: 5098.49 | bwd_allreduce_microstep: 41.25 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2059 [2024-07-29 19:52:35,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3026.17 | bwd_microstep: 4996.47 | bwd_inner_microstep: 4611.84 | bwd_allreduce_microstep: 384.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-29 19:52:44,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.75 | bwd_microstep: 5115.08 | bwd_inner_microstep: 5070.46 | bwd_allreduce_microstep: 44.56 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-29 19:52:53,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.51 | bwd_microstep: 4982.85 | bwd_inner_microstep: 4963.49 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-29 19:53:02,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.23 | bwd_microstep: 5022.35 | bwd_inner_microstep: 4997.48 | bwd_allreduce_microstep: 24.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-29 19:53:09,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3172.92 | bwd_microstep: 4690.32 | bwd_inner_microstep: 4669.15 | bwd_allreduce_microstep: 21.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2127 [2024-07-29 19:53:18,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.38 [2024-07-29 19:53:18,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3460.15 | bwd_microstep: 5045.97 | bwd_inner_microstep: 4655.70 | bwd_allreduce_microstep: 390.20 | step_microstep: 181.33 [2024-07-29 19:53:18,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28013.08 | bwd: 40434.81 | bwd_inner: 39416.31 | bwd_allreduce: 1018.03 | step: 182.02 62%|██████▏ | 418/671 [8:10:04<4:51:45, 69.19s/it] {'loss': 1.1903, 'learning_rate': 6.589893294581579e-06, 'epoch': 0.62} 62%|██████▏ | 418/671 [8:10:04<4:51:45, 69.19s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3962 [2024-07-29 19:53:27,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3802.53 | bwd_microstep: 5190.64 | bwd_inner_microstep: 5171.42 | bwd_allreduce_microstep: 19.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3572 [2024-07-29 19:53:36,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.58 | bwd_microstep: 5180.37 | bwd_inner_microstep: 5093.78 | bwd_allreduce_microstep: 86.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2237 [2024-07-29 19:53:45,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.55 | bwd_microstep: 5176.85 | bwd_inner_microstep: 4775.10 | bwd_allreduce_microstep: 401.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-29 19:53:53,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3053.43 | bwd_microstep: 5048.48 | bwd_inner_microstep: 4660.68 | bwd_allreduce_microstep: 387.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-29 19:54:01,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3246.14 | bwd_microstep: 4853.12 | bwd_inner_microstep: 4826.80 | bwd_allreduce_microstep: 26.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-29 19:54:10,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.68 | bwd_microstep: 5064.19 | bwd_inner_microstep: 5024.04 | bwd_allreduce_microstep: 40.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-29 19:54:18,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.73 | bwd_microstep: 5180.45 | bwd_inner_microstep: 5109.03 | bwd_allreduce_microstep: 71.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-29 19:54:27,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.73 [2024-07-29 19:54:27,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.43 | bwd_microstep: 4898.29 | bwd_inner_microstep: 4878.92 | bwd_allreduce_microstep: 19.30 | step_microstep: 180.67 [2024-07-29 19:54:27,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28208.99 | bwd: 40592.36 | bwd_inner: 39539.70 | bwd_allreduce: 1052.19 | step: 181.25 62%|██████▏ | 419/671 [8:11:13<4:50:32, 69.17s/it] {'loss': 1.1597, 'learning_rate': 6.5444981663299135e-06, 'epoch': 0.62} 62%|██████▏ | 419/671 [8:11:13<4:50:32, 69.17s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3938 [2024-07-29 19:54:36,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3814.51 | bwd_microstep: 5181.44 | bwd_inner_microstep: 5162.37 | bwd_allreduce_microstep: 19.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3813 [2024-07-29 19:54:44,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3241.45 | bwd_microstep: 4841.38 | bwd_inner_microstep: 4821.99 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2258 [2024-07-29 19:54:53,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.75 | bwd_microstep: 5214.47 | bwd_inner_microstep: 4810.43 | bwd_allreduce_microstep: 403.98 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1586 [2024-07-29 19:55:02,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.51 | bwd_microstep: 5173.92 | bwd_inner_microstep: 4772.06 | bwd_allreduce_microstep: 401.79 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-29 19:55:10,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3190.56 | bwd_microstep: 4791.56 | bwd_inner_microstep: 4772.19 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-29 19:55:18,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.44 | bwd_microstep: 4911.54 | bwd_inner_microstep: 4887.59 | bwd_allreduce_microstep: 23.89 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2135 [2024-07-29 19:55:27,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.31 | bwd_microstep: 5116.02 | bwd_inner_microstep: 4721.34 | bwd_allreduce_microstep: 394.62 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3654 [2024-07-29 19:55:36,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 19:55:36,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.64 | bwd_microstep: 4825.10 | bwd_inner_microstep: 4803.23 | bwd_allreduce_microstep: 21.81 | step_microstep: 180.73 [2024-07-29 19:55:36,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28128.08 | bwd: 40055.43 | bwd_inner: 38751.14 | bwd_allreduce: 1303.80 | step: 181.32 63%|██████▎ | 420/671 [8:12:22<4:48:32, 68.98s/it] {'loss': 1.1564, 'learning_rate': 6.499183758479944e-06, 'epoch': 0.63} 63%|██████▎ | 420/671 [8:12:22<4:48:32, 68.98s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3933 [2024-07-29 19:55:45,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3824.40 | bwd_microstep: 5181.35 | bwd_inner_microstep: 5162.21 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2295 [2024-07-29 19:55:54,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.39 | bwd_microstep: 5220.28 | bwd_inner_microstep: 4816.24 | bwd_allreduce_microstep: 403.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2271 [2024-07-29 19:56:02,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.94 | bwd_microstep: 5218.30 | bwd_inner_microstep: 4812.36 | bwd_allreduce_microstep: 405.87 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-29 19:56:11,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.37 | bwd_microstep: 5185.73 | bwd_inner_microstep: 5099.95 | bwd_allreduce_microstep: 85.71 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2227 [2024-07-29 19:56:20,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.52 | bwd_microstep: 5180.55 | bwd_inner_microstep: 4776.25 | bwd_allreduce_microstep: 404.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3726 [2024-07-29 19:56:29,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.11 | bwd_microstep: 5054.53 | bwd_inner_microstep: 5010.30 | bwd_allreduce_microstep: 44.16 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-29 19:56:37,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3706.54 | bwd_microstep: 4977.62 | bwd_inner_microstep: 4958.24 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2135 [2024-07-29 19:56:46,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 19:56:46,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.82 | bwd_microstep: 5224.06 | bwd_inner_microstep: 4817.56 | bwd_allreduce_microstep: 406.43 | step_microstep: 182.24 [2024-07-29 19:56:46,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28889.99 | bwd: 41242.40 | bwd_inner: 39453.05 | bwd_allreduce: 1788.88 | step: 182.93 63%|██████▎ | 421/671 [8:13:32<4:49:15, 69.42s/it] {'loss': 1.1286, 'learning_rate': 6.453951129574644e-06, 'epoch': 0.63} 63%|██████▎ | 421/671 [8:13:32<4:49:15, 69.42s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3899 [2024-07-29 19:56:55,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.61 | bwd_microstep: 5397.52 | bwd_inner_microstep: 5324.50 | bwd_allreduce_microstep: 72.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3577 [2024-07-29 19:57:04,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.71 | bwd_microstep: 5231.07 | bwd_inner_microstep: 5135.48 | bwd_allreduce_microstep: 95.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-29 19:57:13,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.39 | bwd_microstep: 5166.61 | bwd_inner_microstep: 5086.36 | bwd_allreduce_microstep: 80.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-29 19:57:21,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3232.91 | bwd_microstep: 4882.09 | bwd_inner_microstep: 4831.88 | bwd_allreduce_microstep: 50.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-29 19:57:30,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.80 | bwd_microstep: 4996.39 | bwd_inner_microstep: 4977.05 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2176 [2024-07-29 19:57:39,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.50 | bwd_microstep: 5182.11 | bwd_inner_microstep: 4779.39 | bwd_allreduce_microstep: 402.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-29 19:57:47,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3450.44 | bwd_microstep: 5022.51 | bwd_inner_microstep: 4634.22 | bwd_allreduce_microstep: 388.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-29 19:57:56,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.44 [2024-07-29 19:57:56,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.39 | bwd_microstep: 5160.28 | bwd_inner_microstep: 5106.24 | bwd_allreduce_microstep: 53.97 | step_microstep: 181.92 [2024-07-29 19:57:56,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28506.65 | bwd: 41038.58 | bwd_inner: 39875.06 | bwd_allreduce: 1163.04 | step: 182.48 63%|██████▎ | 422/671 [8:14:42<4:48:39, 69.56s/it] {'loss': 1.1151, 'learning_rate': 6.408801336246645e-06, 'epoch': 0.63} 63%|██████▎ | 422/671 [8:14:42<4:48:39, 69.56s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3936 [2024-07-29 19:58:04,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3277.22 | bwd_microstep: 4961.08 | bwd_inner_microstep: 4941.89 | bwd_allreduce_microstep: 19.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3865 [2024-07-29 19:58:13,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.39 | bwd_microstep: 5120.01 | bwd_inner_microstep: 5100.65 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-07-29 19:58:22,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.81 | bwd_microstep: 5049.81 | bwd_inner_microstep: 5030.39 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-29 19:58:31,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.46 | bwd_microstep: 5191.19 | bwd_inner_microstep: 4789.76 | bwd_allreduce_microstep: 401.36 | step_microstep: 0.10 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2167 [2024-07-29 19:58:39,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3043.45 | bwd_microstep: 4991.92 | bwd_inner_microstep: 4606.33 | bwd_allreduce_microstep: 385.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-29 19:58:47,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.10 | bwd_microstep: 4801.56 | bwd_inner_microstep: 4782.16 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-29 19:58:56,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.20 | bwd_microstep: 5213.46 | bwd_inner_microstep: 4807.44 | bwd_allreduce_microstep: 405.96 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-29 19:59:04,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 19:59:04,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3212.68 | bwd_microstep: 4728.77 | bwd_inner_microstep: 4701.83 | bwd_allreduce_microstep: 26.87 | step_microstep: 182.60 [2024-07-29 19:59:04,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27396.23 | bwd: 40057.78 | bwd_inner: 38760.40 | bwd_allreduce: 1296.91 | step: 183.19 63%|██████▎ | 423/671 [8:15:50<4:45:18, 69.03s/it] {'loss': 1.1358, 'learning_rate': 6.363735433193532e-06, 'epoch': 0.63} 63%|██████▎ | 423/671 [8:15:50<4:45:18, 69.03s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3904 [2024-07-29 19:59:13,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.27 | bwd_microstep: 5150.48 | bwd_inner_microstep: 5100.48 | bwd_allreduce_microstep: 49.93 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3580 [2024-07-29 19:59:21,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.13 | bwd_microstep: 5140.63 | bwd_inner_microstep: 5040.73 | bwd_allreduce_microstep: 99.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-29 19:59:30,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.72 | bwd_microstep: 5177.91 | bwd_inner_microstep: 5100.54 | bwd_allreduce_microstep: 77.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2221 [2024-07-29 19:59:38,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3064.23 | bwd_microstep: 5035.88 | bwd_inner_microstep: 4646.18 | bwd_allreduce_microstep: 389.63 | step_microstep: 0.20 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3596 [2024-07-29 19:59:47,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.28 | bwd_microstep: 5117.54 | bwd_inner_microstep: 5047.27 | bwd_allreduce_microstep: 70.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-29 19:59:55,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3251.43 | bwd_microstep: 4833.75 | bwd_inner_microstep: 4809.34 | bwd_allreduce_microstep: 24.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-29 20:00:03,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.53 | bwd_microstep: 4829.98 | bwd_inner_microstep: 4791.43 | bwd_allreduce_microstep: 38.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-29 20:00:12,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 20:00:12,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.69 | bwd_microstep: 5117.74 | bwd_inner_microstep: 4720.12 | bwd_allreduce_microstep: 397.54 | step_microstep: 180.59 [2024-07-29 20:00:12,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27308.20 | bwd: 40403.89 | bwd_inner: 39256.05 | bwd_allreduce: 1147.37 | step: 181.28 63%|██████▎ | 424/671 [8:16:58<4:42:56, 68.73s/it] {'loss': 1.14, 'learning_rate': 6.318754473153224e-06, 'epoch': 0.63} 63%|██████▎ | 424/671 [8:16:58<4:42:56, 68.73s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3926 [2024-07-29 20:00:21,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3799.56 | bwd_microstep: 5176.93 | bwd_inner_microstep: 5157.89 | bwd_allreduce_microstep: 18.97 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2294 [2024-07-29 20:00:29,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3000.99 | bwd_microstep: 4852.19 | bwd_inner_microstep: 4479.09 | bwd_allreduce_microstep: 373.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-29 20:00:37,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.07 | bwd_microstep: 5099.75 | bwd_inner_microstep: 5033.61 | bwd_allreduce_microstep: 66.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2236 [2024-07-29 20:00:46,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.08 | bwd_microstep: 5113.23 | bwd_inner_microstep: 4717.75 | bwd_allreduce_microstep: 395.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-29 20:00:55,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.89 | bwd_microstep: 5111.44 | bwd_inner_microstep: 5044.76 | bwd_allreduce_microstep: 66.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-29 20:01:03,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.57 | bwd_microstep: 4882.15 | bwd_inner_microstep: 4862.75 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-29 20:01:12,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.09 | bwd_microstep: 5142.42 | bwd_inner_microstep: 5070.63 | bwd_allreduce_microstep: 71.73 | step_microstep: 0.07 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2153 [2024-07-29 20:01:21,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 20:01:21,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.46 | bwd_microstep: 5111.46 | bwd_inner_microstep: 4716.56 | bwd_allreduce_microstep: 394.83 | step_microstep: 180.66 [2024-07-29 20:01:21,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28220.62 | bwd: 40489.54 | bwd_inner: 39082.98 | bwd_allreduce: 1406.10 | step: 181.22 63%|██████▎ | 425/671 [8:18:07<4:42:10, 68.82s/it] {'loss': 1.1771, 'learning_rate': 6.273859506879365e-06, 'epoch': 0.63} 63%|██████▎ | 425/671 [8:18:07<4:42:10, 68.82s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2420 [2024-07-29 20:01:30,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.11 | bwd_microstep: 5433.42 | bwd_inner_microstep: 5017.51 | bwd_allreduce_microstep: 415.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3598 [2024-07-29 20:01:39,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.51 | bwd_microstep: 5146.19 | bwd_inner_microstep: 5067.69 | bwd_allreduce_microstep: 78.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3781 [2024-07-29 20:01:47,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3266.41 | bwd_microstep: 4966.23 | bwd_inner_microstep: 4929.62 | bwd_allreduce_microstep: 36.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3769 [2024-07-29 20:01:55,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3238.02 | bwd_microstep: 4811.81 | bwd_inner_microstep: 4792.14 | bwd_allreduce_microstep: 19.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-29 20:02:04,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.38 | bwd_microstep: 5051.34 | bwd_inner_microstep: 4992.58 | bwd_allreduce_microstep: 58.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-29 20:02:13,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.91 | bwd_microstep: 5012.98 | bwd_inner_microstep: 4993.60 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2135 [2024-07-29 20:02:21,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.54 | bwd_microstep: 5257.20 | bwd_inner_microstep: 4851.35 | bwd_allreduce_microstep: 405.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-29 20:02:30,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 20:02:30,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.57 | bwd_microstep: 4991.85 | bwd_inner_microstep: 4939.17 | bwd_allreduce_microstep: 52.61 | step_microstep: 182.74 [2024-07-29 20:02:30,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28099.36 | bwd: 40671.00 | bwd_inner: 39583.60 | bwd_allreduce: 1086.92 | step: 183.31 63%|██████▎ | 426/671 [8:19:16<4:41:22, 68.91s/it] {'loss': 1.1251, 'learning_rate': 6.229051583116799e-06, 'epoch': 0.63} 63%|██████▎ | 426/671 [8:19:16<4:41:22, 68.91s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2240 [2024-07-29 20:02:39,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.40 | bwd_microstep: 5286.16 | bwd_inner_microstep: 4876.94 | bwd_allreduce_microstep: 409.16 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3793 [2024-07-29 20:02:48,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.37 | bwd_microstep: 5224.88 | bwd_inner_microstep: 5165.28 | bwd_allreduce_microstep: 59.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3766 [2024-07-29 20:02:57,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.28 | bwd_microstep: 5155.28 | bwd_inner_microstep: 5100.70 | bwd_allreduce_microstep: 54.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3831 [2024-07-29 20:03:05,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.39 | bwd_microstep: 5174.25 | bwd_inner_microstep: 5126.74 | bwd_allreduce_microstep: 47.44 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1181 [2024-07-29 20:03:13,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3015.49 | bwd_microstep: 5038.81 | bwd_inner_microstep: 4654.87 | bwd_allreduce_microstep: 383.87 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2207 [2024-07-29 20:03:22,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3434.03 | bwd_microstep: 5014.99 | bwd_inner_microstep: 4626.53 | bwd_allreduce_microstep: 388.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-29 20:03:31,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.00 | bwd_microstep: 5399.67 | bwd_inner_microstep: 4873.88 | bwd_allreduce_microstep: 525.73 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-29 20:03:40,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 20:03:40,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.82 | bwd_microstep: 5028.99 | bwd_inner_microstep: 4971.39 | bwd_allreduce_microstep: 57.53 | step_microstep: 180.66 [2024-07-29 20:03:40,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27956.70 | bwd: 41323.02 | bwd_inner: 39396.26 | bwd_allreduce: 1926.27 | step: 181.25 64%|██████▎ | 427/671 [8:20:26<4:41:04, 69.12s/it] {'loss': 1.0971, 'learning_rate': 6.184331748577049e-06, 'epoch': 0.64} 64%|██████▎ | 427/671 [8:20:26<4:41:04, 69.12s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-29 20:03:49,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.27 | bwd_microstep: 5255.75 | bwd_inner_microstep: 5175.51 | bwd_allreduce_microstep: 80.18 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2274 [2024-07-29 20:03:57,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.87 | bwd_microstep: 5163.87 | bwd_inner_microstep: 4762.55 | bwd_allreduce_microstep: 401.26 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-07-29 20:04:05,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3219.79 | bwd_microstep: 4863.26 | bwd_inner_microstep: 4815.42 | bwd_allreduce_microstep: 47.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3738 [2024-07-29 20:04:13,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3216.91 | bwd_microstep: 4791.28 | bwd_inner_microstep: 4771.83 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-29 20:04:22,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3041.60 | bwd_microstep: 5004.63 | bwd_inner_microstep: 4616.84 | bwd_allreduce_microstep: 387.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-29 20:04:30,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.07 | bwd_microstep: 5172.36 | bwd_inner_microstep: 4769.96 | bwd_allreduce_microstep: 402.33 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-29 20:04:39,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.06 | bwd_microstep: 5011.70 | bwd_inner_microstep: 4961.61 | bwd_allreduce_microstep: 50.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-29 20:04:48,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 20:04:48,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.44 | bwd_microstep: 4982.93 | bwd_inner_microstep: 4930.24 | bwd_allreduce_microstep: 52.62 | step_microstep: 181.39 [2024-07-29 20:04:48,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27274.91 | bwd: 40245.76 | bwd_inner: 38803.90 | bwd_allreduce: 1441.39 | step: 182.07 64%|██████▍ | 428/671 [8:21:33<4:38:22, 68.74s/it] {'loss': 1.0982, 'learning_rate': 6.139701047913885e-06, 'epoch': 0.64} 64%|██████▍ | 428/671 [8:21:33<4:38:22, 68.74s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2401 [2024-07-29 20:04:56,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.69 | bwd_microstep: 5280.41 | bwd_inner_microstep: 4872.34 | bwd_allreduce_microstep: 408.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3576 [2024-07-29 20:05:05,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.21 | bwd_microstep: 5167.56 | bwd_inner_microstep: 5083.93 | bwd_allreduce_microstep: 83.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3591 [2024-07-29 20:05:14,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.80 | bwd_microstep: 5155.06 | bwd_inner_microstep: 5078.66 | bwd_allreduce_microstep: 76.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-29 20:05:23,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.57 | bwd_microstep: 5117.33 | bwd_inner_microstep: 5041.72 | bwd_allreduce_microstep: 75.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-29 20:05:31,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3190.29 | bwd_microstep: 4729.51 | bwd_inner_microstep: 4702.56 | bwd_allreduce_microstep: 26.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-29 20:05:39,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3251.93 | bwd_microstep: 4870.21 | bwd_inner_microstep: 4827.84 | bwd_allreduce_microstep: 42.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-29 20:05:47,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.97 | bwd_microstep: 5124.73 | bwd_inner_microstep: 4727.97 | bwd_allreduce_microstep: 396.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-29 20:05:56,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 20:05:56,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.62 | bwd_microstep: 4860.59 | bwd_inner_microstep: 4820.20 | bwd_allreduce_microstep: 40.32 | step_microstep: 181.69 [2024-07-29 20:05:56,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27549.99 | bwd: 40305.37 | bwd_inner: 39155.15 | bwd_allreduce: 1149.76 | step: 182.26 64%|██████▍ | 429/671 [8:22:42<4:36:34, 68.57s/it] {'loss': 1.1474, 'learning_rate': 6.095160523698913e-06, 'epoch': 0.64} 64%|██████▍ | 429/671 [8:22:42<4:36:34, 68.57s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3969 [2024-07-29 20:06:04,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3341.85 | bwd_microstep: 5072.03 | bwd_inner_microstep: 5052.95 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-29 20:06:13,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.09 | bwd_microstep: 5187.45 | bwd_inner_microstep: 5110.15 | bwd_allreduce_microstep: 77.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3640 [2024-07-29 20:06:21,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3064.75 | bwd_microstep: 4910.21 | bwd_inner_microstep: 4854.53 | bwd_allreduce_microstep: 55.61 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2096 [2024-07-29 20:06:30,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.49 | bwd_microstep: 5127.49 | bwd_inner_microstep: 4729.01 | bwd_allreduce_microstep: 398.41 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3666 [2024-07-29 20:06:38,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3109.34 | bwd_microstep: 4972.47 | bwd_inner_microstep: 4915.74 | bwd_allreduce_microstep: 56.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-29 20:06:46,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.08 | bwd_microstep: 5157.38 | bwd_inner_microstep: 4756.75 | bwd_allreduce_microstep: 400.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3668 [2024-07-29 20:06:55,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.53 | bwd_microstep: 5010.81 | bwd_inner_microstep: 4940.97 | bwd_allreduce_microstep: 69.78 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2137 [2024-07-29 20:07:04,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 20:07:04,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.44 | bwd_microstep: 5069.92 | bwd_inner_microstep: 4678.44 | bwd_allreduce_microstep: 391.41 | step_microstep: 180.80 [2024-07-29 20:07:04,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27202.48 | bwd: 40507.74 | bwd_inner: 39038.48 | bwd_allreduce: 1468.79 | step: 181.38 64%|██████▍ | 430/671 [8:23:50<4:34:47, 68.41s/it] {'loss': 1.1362, 'learning_rate': 6.0507112163972106e-06, 'epoch': 0.64} 64%|██████▍ | 430/671 [8:23:50<4:34:47, 68.41s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3913 [2024-07-29 20:07:13,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3805.66 | bwd_microstep: 5140.94 | bwd_inner_microstep: 5121.81 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3583 [2024-07-29 20:07:21,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.69 | bwd_microstep: 4913.54 | bwd_inner_microstep: 4854.85 | bwd_allreduce_microstep: 58.63 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2098 [2024-07-29 20:07:30,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.84 | bwd_microstep: 5233.81 | bwd_inner_microstep: 4828.40 | bwd_allreduce_microstep: 405.34 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-29 20:07:38,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.09 | bwd_microstep: 4887.89 | bwd_inner_microstep: 4868.47 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-29 20:07:47,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.30 | bwd_microstep: 5013.84 | bwd_inner_microstep: 4994.49 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-29 20:07:56,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.11 | bwd_microstep: 4969.37 | bwd_inner_microstep: 4939.98 | bwd_allreduce_microstep: 29.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-29 20:08:04,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.21 | bwd_microstep: 4961.84 | bwd_inner_microstep: 4928.36 | bwd_allreduce_microstep: 33.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-29 20:08:13,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-29 20:08:13,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3691.23 | bwd_microstep: 4901.30 | bwd_inner_microstep: 4881.81 | bwd_allreduce_microstep: 19.42 | step_microstep: 181.63 [2024-07-29 20:08:13,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29144.03 | bwd: 40022.52 | bwd_inner: 39418.11 | bwd_allreduce: 603.93 | step: 182.23 64%|██████▍ | 431/671 [8:24:59<4:34:56, 68.74s/it] {'loss': 1.1188, 'learning_rate': 6.006354164343047e-06, 'epoch': 0.64} 64%|██████▍ | 431/671 [8:24:59<4:34:56, 68.74s/it]dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 881 [2024-07-29 20:08:22,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.80 | bwd_microstep: 5600.30 | bwd_inner_microstep: 5169.04 | bwd_allreduce_microstep: 431.19 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3576 [2024-07-29 20:08:31,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3216.73 | bwd_microstep: 4874.84 | bwd_inner_microstep: 4819.54 | bwd_allreduce_microstep: 55.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-29 20:08:38,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2962.63 | bwd_microstep: 4786.75 | bwd_inner_microstep: 4418.33 | bwd_allreduce_microstep: 368.36 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-29 20:08:47,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.48 | bwd_microstep: 5110.00 | bwd_inner_microstep: 5039.16 | bwd_allreduce_microstep: 70.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-29 20:08:55,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3273.88 | bwd_microstep: 4828.60 | bwd_inner_microstep: 4804.68 | bwd_allreduce_microstep: 23.86 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-29 20:09:04,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.88 | bwd_microstep: 5102.75 | bwd_inner_microstep: 4706.91 | bwd_allreduce_microstep: 395.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-29 20:09:12,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.67 | bwd_microstep: 5062.77 | bwd_inner_microstep: 4670.16 | bwd_allreduce_microstep: 392.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-29 20:09:21,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 20:09:21,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.35 | bwd_microstep: 5064.19 | bwd_inner_microstep: 5005.53 | bwd_allreduce_microstep: 58.60 | step_microstep: 181.00 [2024-07-29 20:09:21,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27174.33 | bwd: 40430.20 | bwd_inner: 38633.29 | bwd_allreduce: 1796.44 | step: 181.67 64%|██████▍ | 432/671 [8:26:07<4:32:50, 68.49s/it] {'loss': 1.1482, 'learning_rate': 5.962090403715592e-06, 'epoch': 0.64} 64%|██████▍ | 432/671 [8:26:07<4:32:50, 68.49s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2347 [2024-07-29 20:09:30,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.84 | bwd_microstep: 5404.87 | bwd_inner_microstep: 4989.49 | bwd_allreduce_microstep: 415.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3557 [2024-07-29 20:09:38,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3126.82 | bwd_microstep: 5022.47 | bwd_inner_microstep: 4931.24 | bwd_allreduce_microstep: 91.17 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2217 [2024-07-29 20:09:47,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.63 | bwd_microstep: 5167.99 | bwd_inner_microstep: 4764.25 | bwd_allreduce_microstep: 403.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-29 20:09:55,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3206.79 | bwd_microstep: 4789.53 | bwd_inner_microstep: 4770.22 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-29 20:10:04,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.89 | bwd_microstep: 4985.02 | bwd_inner_microstep: 4965.67 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-29 20:10:12,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.01 | bwd_microstep: 4898.63 | bwd_inner_microstep: 4879.35 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-29 20:10:20,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3194.46 | bwd_microstep: 4716.72 | bwd_inner_microstep: 4692.37 | bwd_allreduce_microstep: 24.28 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3645 [2024-07-29 20:10:29,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 20:10:29,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.69 | bwd_microstep: 5048.55 | bwd_inner_microstep: 4969.03 | bwd_allreduce_microstep: 79.45 | step_microstep: 180.89 [2024-07-29 20:10:29,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27666.04 | bwd: 40033.75 | bwd_inner: 38961.57 | bwd_allreduce: 1071.71 | step: 181.44 65%|██████▍ | 433/671 [8:27:15<4:31:07, 68.35s/it] {'loss': 1.1055, 'learning_rate': 5.9179209685147525e-06, 'epoch': 0.64} 65%|██████▍ | 433/671 [8:27:15<4:31:07, 68.35s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3992 [2024-07-29 20:10:38,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3850.56 | bwd_microstep: 5265.39 | bwd_inner_microstep: 5246.25 | bwd_allreduce_microstep: 19.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2263 [2024-07-29 20:10:47,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.36 | bwd_microstep: 5195.99 | bwd_inner_microstep: 4792.02 | bwd_allreduce_microstep: 403.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-29 20:10:56,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.04 | bwd_microstep: 5157.04 | bwd_inner_microstep: 5103.82 | bwd_allreduce_microstep: 53.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-29 20:11:05,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.02 | bwd_microstep: 4982.81 | bwd_inner_microstep: 4963.52 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-29 20:11:13,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.60 | bwd_microstep: 5041.50 | bwd_inner_microstep: 5022.07 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-29 20:11:22,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.40 | bwd_microstep: 5181.57 | bwd_inner_microstep: 5128.45 | bwd_allreduce_microstep: 53.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-29 20:11:30,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3212.51 | bwd_microstep: 4810.26 | bwd_inner_microstep: 4778.81 | bwd_allreduce_microstep: 31.38 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-29 20:11:39,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.74 [2024-07-29 20:11:39,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.10 | bwd_microstep: 4997.32 | bwd_inner_microstep: 4978.01 | bwd_allreduce_microstep: 19.24 | step_microstep: 181.10 [2024-07-29 20:11:39,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29073.51 | bwd: 40631.86 | bwd_inner: 40012.91 | bwd_allreduce: 618.48 | step: 181.65 65%|██████▍ | 434/671 [8:28:25<4:31:59, 68.86s/it] {'loss': 1.1933, 'learning_rate': 5.873846890536977e-06, 'epoch': 0.65} 65%|██████▍ | 434/671 [8:28:25<4:31:59, 68.86s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3866 [2024-07-29 20:11:48,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.24 | bwd_microstep: 5175.91 | bwd_inner_microstep: 5132.24 | bwd_allreduce_microstep: 43.59 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3571 [2024-07-29 20:11:57,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.82 | bwd_microstep: 5178.17 | bwd_inner_microstep: 5086.13 | bwd_allreduce_microstep: 91.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-29 20:12:06,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3769.76 | bwd_microstep: 5002.46 | bwd_inner_microstep: 4983.09 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3766 [2024-07-29 20:12:14,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.43 | bwd_microstep: 5009.74 | bwd_inner_microstep: 4990.24 | bwd_allreduce_microstep: 19.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-29 20:12:23,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.45 | bwd_microstep: 5025.58 | bwd_inner_microstep: 5001.81 | bwd_allreduce_microstep: 23.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3703 [2024-07-29 20:12:32,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.16 | bwd_microstep: 5160.49 | bwd_inner_microstep: 5076.56 | bwd_allreduce_microstep: 83.86 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-29 20:12:41,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.43 | bwd_microstep: 5106.90 | bwd_inner_microstep: 4708.97 | bwd_allreduce_microstep: 397.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-29 20:12:49,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.40 [2024-07-29 20:12:49,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.34 | bwd_microstep: 5010.15 | bwd_inner_microstep: 4963.12 | bwd_allreduce_microstep: 46.97 | step_microstep: 181.07 [2024-07-29 20:12:49,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29223.54 | bwd: 40669.38 | bwd_inner: 39942.11 | bwd_allreduce: 726.79 | step: 181.65 65%|██████▍ | 435/671 [8:29:35<4:32:26, 69.27s/it] {'loss': 1.1561, 'learning_rate': 5.829869199351188e-06, 'epoch': 0.65} 65%|██████▍ | 435/671 [8:29:35<4:32:26, 69.27s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3802 [2024-07-29 20:12:59,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.37 | bwd_microstep: 5393.81 | bwd_inner_microstep: 5296.71 | bwd_allreduce_microstep: 97.03 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2271 [2024-07-29 20:13:07,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.01 | bwd_microstep: 5169.81 | bwd_inner_microstep: 4767.65 | bwd_allreduce_microstep: 402.09 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2076 [2024-07-29 20:13:16,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.08 | bwd_microstep: 5224.24 | bwd_inner_microstep: 4818.21 | bwd_allreduce_microstep: 405.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2275 [2024-07-29 20:13:25,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.85 | bwd_microstep: 5230.42 | bwd_inner_microstep: 4823.25 | bwd_allreduce_microstep: 407.11 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3639 [2024-07-29 20:13:34,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.00 | bwd_microstep: 5181.23 | bwd_inner_microstep: 5083.65 | bwd_allreduce_microstep: 97.52 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-29 20:13:42,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.40 | bwd_microstep: 5171.15 | bwd_inner_microstep: 5112.96 | bwd_allreduce_microstep: 58.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3694 [2024-07-29 20:13:51,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.83 | bwd_microstep: 5169.90 | bwd_inner_microstep: 5079.66 | bwd_allreduce_microstep: 90.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-29 20:14:00,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 20:14:00,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.17 | bwd_microstep: 5127.35 | bwd_inner_microstep: 5058.28 | bwd_allreduce_microstep: 69.01 | step_microstep: 180.60 [2024-07-29 20:14:00,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28729.62 | bwd: 41667.90 | bwd_inner: 40040.31 | bwd_allreduce: 1627.11 | step: 181.30 65%|██████▍ | 436/671 [8:30:46<4:32:59, 69.70s/it] {'loss': 1.1167, 'learning_rate': 5.785988922274711e-06, 'epoch': 0.65} 65%|██████▍ | 436/671 [8:30:46<4:32:59, 69.70s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3936 [2024-07-29 20:14:09,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.12 | bwd_microstep: 5247.09 | bwd_inner_microstep: 5201.25 | bwd_allreduce_microstep: 45.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3824 [2024-07-29 20:14:17,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3245.46 | bwd_microstep: 4871.47 | bwd_inner_microstep: 4850.02 | bwd_allreduce_microstep: 21.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3799 [2024-07-29 20:14:26,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.87 | bwd_microstep: 5183.50 | bwd_inner_microstep: 5133.38 | bwd_allreduce_microstep: 50.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2199 [2024-07-29 20:14:35,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.10 | bwd_microstep: 5214.52 | bwd_inner_microstep: 4809.37 | bwd_allreduce_microstep: 405.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-29 20:14:44,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.23 | bwd_microstep: 5131.37 | bwd_inner_microstep: 5061.36 | bwd_allreduce_microstep: 69.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-29 20:14:52,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3016.82 | bwd_microstep: 4908.37 | bwd_inner_microstep: 4532.13 | bwd_allreduce_microstep: 376.17 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3205 [2024-07-29 20:15:00,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3069.69 | bwd_microstep: 4862.09 | bwd_inner_microstep: 4771.63 | bwd_allreduce_microstep: 90.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-29 20:15:08,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 20:15:08,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.87 | bwd_microstep: 5055.72 | bwd_inner_microstep: 4663.45 | bwd_allreduce_microstep: 392.21 | step_microstep: 180.93 [2024-07-29 20:15:08,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27273.09 | bwd: 40474.11 | bwd_inner: 39022.53 | bwd_allreduce: 1451.11 | step: 181.50 65%|██████▌ | 437/671 [8:31:54<4:29:55, 69.21s/it] {'loss': 1.1961, 'learning_rate': 5.742207084349274e-06, 'epoch': 0.65} 65%|██████▌ | 437/671 [8:31:54<4:29:55, 69.21s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2369 [2024-07-29 20:15:17,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.76 | bwd_microstep: 5352.67 | bwd_inner_microstep: 4939.05 | bwd_allreduce_microstep: 413.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3568 [2024-07-29 20:15:26,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.59 | bwd_microstep: 5249.06 | bwd_inner_microstep: 5158.48 | bwd_allreduce_microstep: 90.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-29 20:15:35,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.13 | bwd_microstep: 5216.82 | bwd_inner_microstep: 5129.77 | bwd_allreduce_microstep: 86.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-29 20:15:44,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.13 | bwd_microstep: 5080.51 | bwd_inner_microstep: 5052.37 | bwd_allreduce_microstep: 28.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3759 [2024-07-29 20:15:53,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.25 | bwd_microstep: 5137.53 | bwd_inner_microstep: 5090.38 | bwd_allreduce_microstep: 47.09 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 639 [2024-07-29 20:16:01,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.36 | bwd_microstep: 5337.34 | bwd_inner_microstep: 4928.96 | bwd_allreduce_microstep: 408.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-29 20:16:10,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.89 | bwd_microstep: 5158.75 | bwd_inner_microstep: 5088.87 | bwd_allreduce_microstep: 69.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-29 20:16:19,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 20:16:19,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.85 | bwd_microstep: 5023.86 | bwd_inner_microstep: 4973.02 | bwd_allreduce_microstep: 50.78 | step_microstep: 180.90 [2024-07-29 20:16:19,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28849.83 | bwd: 41556.52 | bwd_inner: 40360.84 | bwd_allreduce: 1195.22 | step: 181.47 65%|██████▌ | 438/671 [8:33:05<4:30:32, 69.67s/it] {'loss': 1.1883, 'learning_rate': 5.698524708317082e-06, 'epoch': 0.65} 65%|██████▌ | 438/671 [8:33:05<4:30:32, 69.67s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-29 20:16:28,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.31 | bwd_microstep: 5322.35 | bwd_inner_microstep: 5250.45 | bwd_allreduce_microstep: 71.84 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3830 [2024-07-29 20:16:36,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3211.96 | bwd_microstep: 4839.68 | bwd_inner_microstep: 4820.38 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2277 [2024-07-29 20:16:45,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.42 | bwd_microstep: 5246.17 | bwd_inner_microstep: 4837.46 | bwd_allreduce_microstep: 408.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3774 [2024-07-29 20:16:54,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.99 | bwd_microstep: 5177.78 | bwd_inner_microstep: 5121.77 | bwd_allreduce_microstep: 55.94 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-29 20:17:02,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.42 | bwd_microstep: 5003.68 | bwd_inner_microstep: 4984.31 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-29 20:17:11,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.42 | bwd_microstep: 5194.17 | bwd_inner_microstep: 5135.51 | bwd_allreduce_microstep: 58.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-29 20:17:19,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3061.72 | bwd_microstep: 5035.51 | bwd_inner_microstep: 4648.60 | bwd_allreduce_microstep: 386.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-29 20:17:28,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 20:17:28,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.95 | bwd_microstep: 5168.52 | bwd_inner_microstep: 5093.14 | bwd_allreduce_microstep: 75.30 | step_microstep: 181.15 [2024-07-29 20:17:28,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28131.10 | bwd: 40987.82 | bwd_inner: 39891.54 | bwd_allreduce: 1095.76 | step: 181.73 65%|██████▌ | 439/671 [8:34:14<4:29:07, 69.60s/it] {'loss': 1.1935, 'learning_rate': 5.654942814596902e-06, 'epoch': 0.65} 65%|██████▌ | 439/671 [8:34:14<4:29:07, 69.60s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3901 [2024-07-29 20:17:37,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3797.42 | bwd_microstep: 5139.83 | bwd_inner_microstep: 5120.61 | bwd_allreduce_microstep: 19.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3581 [2024-07-29 20:17:46,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.23 | bwd_microstep: 5185.08 | bwd_inner_microstep: 5100.17 | bwd_allreduce_microstep: 84.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2251 [2024-07-29 20:17:55,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.75 | bwd_microstep: 5166.17 | bwd_inner_microstep: 4762.24 | bwd_allreduce_microstep: 403.85 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1207 [2024-07-29 20:18:03,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3393.40 | bwd_microstep: 5047.10 | bwd_inner_microstep: 4658.15 | bwd_allreduce_microstep: 388.88 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3727 [2024-07-29 20:18:12,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.21 | bwd_microstep: 5156.04 | bwd_inner_microstep: 5085.92 | bwd_allreduce_microstep: 70.06 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3727 [2024-07-29 20:18:21,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.75 | bwd_microstep: 5020.75 | bwd_inner_microstep: 4978.02 | bwd_allreduce_microstep: 42.66 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2119 [2024-07-29 20:18:30,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.58 | bwd_microstep: 5203.51 | bwd_inner_microstep: 4799.00 | bwd_allreduce_microstep: 404.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-29 20:18:38,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 20:18:38,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.34 | bwd_microstep: 5049.65 | bwd_inner_microstep: 4995.34 | bwd_allreduce_microstep: 54.24 | step_microstep: 181.57 [2024-07-29 20:18:38,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28650.59 | bwd: 40968.10 | bwd_inner: 39499.38 | bwd_allreduce: 1468.21 | step: 182.25 66%|██████▌ | 440/671 [8:35:24<4:28:21, 69.71s/it] {'loss': 1.0944, 'learning_rate': 5.611462421260251e-06, 'epoch': 0.65} 66%|██████▌ | 440/671 [8:35:24<4:28:21, 69.71s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3905 [2024-07-29 20:18:47,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3658.40 | bwd_microstep: 5257.44 | bwd_inner_microstep: 5217.96 | bwd_allreduce_microstep: 39.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2281 [2024-07-29 20:18:56,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.34 | bwd_microstep: 5217.04 | bwd_inner_microstep: 4812.34 | bwd_allreduce_microstep: 404.63 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-29 20:19:05,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.38 | bwd_microstep: 5169.03 | bwd_inner_microstep: 5089.77 | bwd_allreduce_microstep: 79.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-29 20:19:14,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.96 | bwd_microstep: 5110.59 | bwd_inner_microstep: 5065.10 | bwd_allreduce_microstep: 45.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3783 [2024-07-29 20:19:22,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.78 | bwd_microstep: 5103.62 | bwd_inner_microstep: 5062.77 | bwd_allreduce_microstep: 40.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-29 20:19:31,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.72 | bwd_microstep: 5111.65 | bwd_inner_microstep: 4715.88 | bwd_allreduce_microstep: 395.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-29 20:19:40,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.70 | bwd_microstep: 5142.81 | bwd_inner_microstep: 4741.79 | bwd_allreduce_microstep: 400.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-29 20:19:48,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-29 20:19:48,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.68 | bwd_microstep: 5003.71 | bwd_inner_microstep: 4949.39 | bwd_allreduce_microstep: 54.25 | step_microstep: 180.99 [2024-07-29 20:19:48,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28496.85 | bwd: 41115.85 | bwd_inner: 39654.94 | bwd_allreduce: 1460.45 | step: 181.56 66%|██████▌ | 441/671 [8:36:34<4:27:28, 69.78s/it] {'loss': 1.1319, 'learning_rate': 5.5680845440075885e-06, 'epoch': 0.66} 66%|██████▌ | 441/671 [8:36:34<4:27:28, 69.78s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3950 [2024-07-29 20:19:57,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3303.53 | bwd_microstep: 4979.21 | bwd_inner_microstep: 4960.11 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3793 [2024-07-29 20:20:05,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.83 | bwd_microstep: 5022.17 | bwd_inner_microstep: 5002.74 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-29 20:20:14,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3493.70 | bwd_microstep: 5133.54 | bwd_inner_microstep: 4733.02 | bwd_allreduce_microstep: 400.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2182 [2024-07-29 20:20:23,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.08 | bwd_microstep: 5048.73 | bwd_inner_microstep: 4654.43 | bwd_allreduce_microstep: 394.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-29 20:20:31,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.26 | bwd_microstep: 4978.60 | bwd_inner_microstep: 4959.21 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-29 20:20:40,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.85 | bwd_microstep: 5027.87 | bwd_inner_microstep: 4970.09 | bwd_allreduce_microstep: 57.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-29 20:20:48,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3443.98 | bwd_microstep: 5014.68 | bwd_inner_microstep: 4625.91 | bwd_allreduce_microstep: 388.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-29 20:20:57,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-07-29 20:20:57,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.98 | bwd_microstep: 5054.95 | bwd_inner_microstep: 4994.83 | bwd_allreduce_microstep: 60.05 | step_microstep: 180.39 [2024-07-29 20:20:57,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28300.12 | bwd: 40259.73 | bwd_inner: 38900.28 | bwd_allreduce: 1358.98 | step: 180.95 66%|██████▌ | 442/671 [8:37:43<4:25:17, 69.51s/it] {'loss': 1.2082, 'learning_rate': 5.5248101961446065e-06, 'epoch': 0.66} 66%|██████▌ | 442/671 [8:37:43<4:25:17, 69.51s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3880 [2024-07-29 20:21:06,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.35 | bwd_microstep: 5125.72 | bwd_inner_microstep: 5106.04 | bwd_allreduce_microstep: 19.62 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2235 [2024-07-29 20:21:14,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3043.76 | bwd_microstep: 5019.63 | bwd_inner_microstep: 4630.65 | bwd_allreduce_microstep: 388.91 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-29 20:21:23,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.83 | bwd_microstep: 5092.65 | bwd_inner_microstep: 5050.62 | bwd_allreduce_microstep: 41.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-29 20:21:31,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3222.58 | bwd_microstep: 4815.92 | bwd_inner_microstep: 4796.58 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-07-29 20:21:39,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.30 | bwd_microstep: 5045.12 | bwd_inner_microstep: 4651.72 | bwd_allreduce_microstep: 393.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-29 20:21:48,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.50 | bwd_microstep: 5008.73 | bwd_inner_microstep: 4989.39 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-29 20:21:57,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.72 | bwd_microstep: 5065.12 | bwd_inner_microstep: 5024.13 | bwd_allreduce_microstep: 40.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-29 20:22:06,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 20:22:06,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.66 | bwd_microstep: 5165.84 | bwd_inner_microstep: 4764.34 | bwd_allreduce_microstep: 401.44 | step_microstep: 189.21 [2024-07-29 20:22:06,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27957.61 | bwd: 40338.72 | bwd_inner: 39013.40 | bwd_allreduce: 1324.84 | step: 189.79 66%|██████▌ | 443/671 [8:38:52<4:23:07, 69.24s/it] {'loss': 1.1499, 'learning_rate': 5.481640388558551e-06, 'epoch': 0.66} 66%|██████▌ | 443/671 [8:38:52<4:23:07, 69.24s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2350 [2024-07-29 20:22:14,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3128.31 | bwd_microstep: 5202.95 | bwd_inner_microstep: 4803.77 | bwd_allreduce_microstep: 399.12 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3859 [2024-07-29 20:22:23,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.61 | bwd_microstep: 5102.88 | bwd_inner_microstep: 5083.54 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2042 [2024-07-29 20:22:31,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3046.10 | bwd_microstep: 5037.04 | bwd_inner_microstep: 4651.70 | bwd_allreduce_microstep: 385.27 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3622 [2024-07-29 20:22:40,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.42 | bwd_microstep: 5200.67 | bwd_inner_microstep: 5103.82 | bwd_allreduce_microstep: 96.79 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-07-29 20:22:49,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.63 | bwd_microstep: 5022.09 | bwd_inner_microstep: 5002.79 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-29 20:22:57,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.23 | bwd_microstep: 4890.55 | bwd_inner_microstep: 4865.36 | bwd_allreduce_microstep: 25.12 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-29 20:23:06,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3467.73 | bwd_microstep: 5064.10 | bwd_inner_microstep: 4671.45 | bwd_allreduce_microstep: 392.59 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2136 [2024-07-29 20:23:15,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 20:23:15,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.08 | bwd_microstep: 5112.10 | bwd_inner_microstep: 4714.61 | bwd_allreduce_microstep: 397.42 | step_microstep: 181.96 [2024-07-29 20:23:15,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27850.02 | bwd: 40632.36 | bwd_inner: 38896.98 | bwd_allreduce: 1734.91 | step: 182.56 66%|██████▌ | 444/671 [8:40:01<4:21:28, 69.11s/it] {'loss': 1.0991, 'learning_rate': 5.43857612969462e-06, 'epoch': 0.66} 66%|██████▌ | 444/671 [8:40:01<4:21:28, 69.11s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3888 [2024-07-29 20:23:24,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.60 | bwd_microstep: 5320.85 | bwd_inner_microstep: 5256.92 | bwd_allreduce_microstep: 63.87 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-29 20:23:32,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.77 | bwd_microstep: 5159.57 | bwd_inner_microstep: 5104.66 | bwd_allreduce_microstep: 54.85 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2200 [2024-07-29 20:23:41,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.26 | bwd_microstep: 5237.02 | bwd_inner_microstep: 4827.12 | bwd_allreduce_microstep: 409.83 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-29 20:23:50,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.54 | bwd_microstep: 4983.30 | bwd_inner_microstep: 4963.97 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-29 20:23:59,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.54 | bwd_microstep: 5133.14 | bwd_inner_microstep: 5056.49 | bwd_allreduce_microstep: 76.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-29 20:24:07,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3038.82 | bwd_microstep: 4974.77 | bwd_inner_microstep: 4591.63 | bwd_allreduce_microstep: 383.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-29 20:24:15,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.97 | bwd_microstep: 5061.72 | bwd_inner_microstep: 4669.81 | bwd_allreduce_microstep: 391.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-29 20:24:24,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-29 20:24:24,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.16 | bwd_microstep: 4885.04 | bwd_inner_microstep: 4865.73 | bwd_allreduce_microstep: 19.24 | step_microstep: 180.98 [2024-07-29 20:24:24,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28340.56 | bwd: 40755.39 | bwd_inner: 39336.27 | bwd_allreduce: 1418.64 | step: 181.66 66%|██████▋ | 445/671 [8:41:10<4:20:41, 69.21s/it] {'loss': 1.1476, 'learning_rate': 5.3956184255323855e-06, 'epoch': 0.66} 66%|██████▋ | 445/671 [8:41:10<4:20:41, 69.21s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2275 [2024-07-29 20:24:33,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.50 | bwd_microstep: 5195.13 | bwd_inner_microstep: 4795.26 | bwd_allreduce_microstep: 399.80 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-29 20:24:42,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.77 | bwd_microstep: 5133.62 | bwd_inner_microstep: 5086.92 | bwd_allreduce_microstep: 46.63 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3753 [2024-07-29 20:24:50,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.20 | bwd_microstep: 5167.54 | bwd_inner_microstep: 5092.85 | bwd_allreduce_microstep: 74.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-29 20:24:59,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.11 | bwd_microstep: 5060.60 | bwd_inner_microstep: 4993.70 | bwd_allreduce_microstep: 66.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-29 20:25:07,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3198.60 | bwd_microstep: 4694.77 | bwd_inner_microstep: 4669.58 | bwd_allreduce_microstep: 25.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 20:25:15,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.36 | bwd_microstep: 5049.69 | bwd_inner_microstep: 4658.45 | bwd_allreduce_microstep: 391.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-29 20:25:24,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.30 | bwd_microstep: 5213.51 | bwd_inner_microstep: 4806.17 | bwd_allreduce_microstep: 407.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-29 20:25:33,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 20:25:33,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.64 | bwd_microstep: 4911.25 | bwd_inner_microstep: 4887.33 | bwd_allreduce_microstep: 23.86 | step_microstep: 181.94 [2024-07-29 20:25:33,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28234.38 | bwd: 40426.09 | bwd_inner: 38990.19 | bwd_allreduce: 1435.43 | step: 182.52 66%|██████▋ | 446/671 [8:42:19<4:19:17, 69.14s/it] {'loss': 1.1776, 'learning_rate': 5.352768279562315e-06, 'epoch': 0.66} 66%|██████▋ | 446/671 [8:42:19<4:19:17, 69.14s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3870 [2024-07-29 20:25:42,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.07 | bwd_microstep: 5166.48 | bwd_inner_microstep: 5129.21 | bwd_allreduce_microstep: 37.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2270 [2024-07-29 20:25:51,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.81 | bwd_microstep: 5167.09 | bwd_inner_microstep: 4765.97 | bwd_allreduce_microstep: 401.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2199 [2024-07-29 20:25:59,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.19 | bwd_microstep: 5125.11 | bwd_inner_microstep: 4728.16 | bwd_allreduce_microstep: 396.89 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2223 [2024-07-29 20:26:08,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.80 | bwd_microstep: 5256.38 | bwd_inner_microstep: 4847.55 | bwd_allreduce_microstep: 408.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-29 20:26:17,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.93 | bwd_microstep: 4993.53 | bwd_inner_microstep: 4958.91 | bwd_allreduce_microstep: 34.56 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-29 20:26:25,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.67 | bwd_microstep: 4953.52 | bwd_inner_microstep: 4921.00 | bwd_allreduce_microstep: 32.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-29 20:26:34,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.79 | bwd_microstep: 5103.70 | bwd_inner_microstep: 5039.24 | bwd_allreduce_microstep: 64.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-29 20:26:43,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 20:26:43,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.93 | bwd_microstep: 5118.74 | bwd_inner_microstep: 4722.33 | bwd_allreduce_microstep: 396.34 | step_microstep: 180.84 [2024-07-29 20:26:43,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28669.09 | bwd: 40884.54 | bwd_inner: 39112.30 | bwd_allreduce: 1771.76 | step: 181.41 67%|██████▋ | 447/671 [8:43:29<4:18:57, 69.36s/it] {'loss': 1.1296, 'learning_rate': 5.310026692762316e-06, 'epoch': 0.67} 67%|██████▋ | 447/671 [8:43:29<4:18:57, 69.36s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2310 [2024-07-29 20:26:52,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.35 | bwd_microstep: 5215.84 | bwd_inner_microstep: 4815.41 | bwd_allreduce_microstep: 400.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-29 20:27:01,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.95 | bwd_microstep: 5202.17 | bwd_inner_microstep: 5115.14 | bwd_allreduce_microstep: 86.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-29 20:27:09,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.99 | bwd_microstep: 5102.80 | bwd_inner_microstep: 5030.19 | bwd_allreduce_microstep: 72.53 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2083 [2024-07-29 20:27:18,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3442.89 | bwd_microstep: 5036.16 | bwd_inner_microstep: 4647.67 | bwd_allreduce_microstep: 388.42 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3647 [2024-07-29 20:27:26,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3131.49 | bwd_microstep: 4671.73 | bwd_inner_microstep: 4647.75 | bwd_allreduce_microstep: 23.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-29 20:27:34,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.80 | bwd_microstep: 4998.40 | bwd_inner_microstep: 4948.92 | bwd_allreduce_microstep: 49.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-29 20:27:43,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3493.61 | bwd_microstep: 5132.07 | bwd_inner_microstep: 4733.24 | bwd_allreduce_microstep: 398.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-29 20:27:52,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 20:27:52,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.88 | bwd_microstep: 4914.63 | bwd_inner_microstep: 4890.01 | bwd_allreduce_microstep: 24.56 | step_microstep: 181.92 [2024-07-29 20:27:52,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28045.86 | bwd: 40273.76 | bwd_inner: 38828.24 | bwd_allreduce: 1445.03 | step: 182.49 67%|██████▋ | 448/671 [8:44:38<4:17:00, 69.15s/it] {'loss': 1.087, 'learning_rate': 5.267394663574351e-06, 'epoch': 0.67} 67%|██████▋ | 448/671 [8:44:38<4:17:00, 69.15s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3595 [2024-07-29 20:28:00,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3140.61 | bwd_microstep: 4982.63 | bwd_inner_microstep: 4926.15 | bwd_allreduce_microstep: 56.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3808 [2024-07-29 20:28:09,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.53 | bwd_microstep: 5201.51 | bwd_inner_microstep: 5118.25 | bwd_allreduce_microstep: 83.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-29 20:28:17,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.64 | bwd_microstep: 5216.58 | bwd_inner_microstep: 5125.83 | bwd_allreduce_microstep: 90.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3779 [2024-07-29 20:28:26,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.97 | bwd_microstep: 5121.97 | bwd_inner_microstep: 5077.69 | bwd_allreduce_microstep: 44.21 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-29 20:28:35,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3780.55 | bwd_microstep: 5090.58 | bwd_inner_microstep: 5061.14 | bwd_allreduce_microstep: 29.38 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2226 [2024-07-29 20:28:43,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3067.16 | bwd_microstep: 5004.99 | bwd_inner_microstep: 4618.42 | bwd_allreduce_microstep: 386.49 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2147 [2024-07-29 20:28:51,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3046.06 | bwd_microstep: 5008.19 | bwd_inner_microstep: 4624.21 | bwd_allreduce_microstep: 383.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-29 20:29:00,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 20:29:00,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.00 | bwd_microstep: 5062.84 | bwd_inner_microstep: 5000.59 | bwd_allreduce_microstep: 62.19 | step_microstep: 181.01 [2024-07-29 20:29:00,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27440.43 | bwd: 40689.27 | bwd_inner: 39552.23 | bwd_allreduce: 1136.58 | step: 181.59 67%|██████▋ | 449/671 [8:45:46<4:15:05, 68.94s/it] {'loss': 1.1738, 'learning_rate': 5.224873187881136e-06, 'epoch': 0.67} 67%|██████▋ | 449/671 [8:45:46<4:15:05, 68.94s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3843 [2024-07-29 20:29:09,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3844.19 | bwd_microstep: 5160.70 | bwd_inner_microstep: 5134.97 | bwd_allreduce_microstep: 25.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3578 [2024-07-29 20:29:18,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.93 | bwd_microstep: 5148.19 | bwd_inner_microstep: 5062.18 | bwd_allreduce_microstep: 85.95 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2261 [2024-07-29 20:29:27,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.25 | bwd_microstep: 5182.44 | bwd_inner_microstep: 4779.01 | bwd_allreduce_microstep: 403.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 20:29:35,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.97 | bwd_microstep: 4976.66 | bwd_inner_microstep: 4957.30 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-29 20:29:44,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.99 | bwd_microstep: 5142.32 | bwd_inner_microstep: 5091.03 | bwd_allreduce_microstep: 51.22 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-29 20:29:53,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.53 | bwd_microstep: 5115.83 | bwd_inner_microstep: 5050.85 | bwd_allreduce_microstep: 64.91 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3748 [2024-07-29 20:30:02,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.25 | bwd_microstep: 5002.72 | bwd_inner_microstep: 4983.29 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-29 20:30:11,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 20:30:11,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.08 | bwd_microstep: 5168.07 | bwd_inner_microstep: 5094.07 | bwd_allreduce_microstep: 73.92 | step_microstep: 181.19 [2024-07-29 20:30:11,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29241.09 | bwd: 40896.90 | bwd_inner: 40152.65 | bwd_allreduce: 743.77 | step: 181.78 67%|██████▋ | 450/671 [8:46:56<4:15:37, 69.40s/it] {'loss': 1.1627, 'learning_rate': 5.1824632589828465e-06, 'epoch': 0.67} 67%|██████▋ | 450/671 [8:46:56<4:15:37, 69.40s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2108 [2024-07-29 20:30:20,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.70 | bwd_microstep: 5407.72 | bwd_inner_microstep: 4990.39 | bwd_allreduce_microstep: 417.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3842 [2024-07-29 20:30:28,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.11 | bwd_microstep: 5088.78 | bwd_inner_microstep: 5069.42 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3802 [2024-07-29 20:30:37,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.83 | bwd_microstep: 5025.85 | bwd_inner_microstep: 5006.45 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2110 [2024-07-29 20:30:46,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.53 | bwd_microstep: 5176.88 | bwd_inner_microstep: 4776.46 | bwd_allreduce_microstep: 400.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-29 20:30:55,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.09 | bwd_microstep: 5051.48 | bwd_inner_microstep: 4990.37 | bwd_allreduce_microstep: 61.03 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-29 20:31:03,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.09 | bwd_microstep: 5065.58 | bwd_inner_microstep: 5022.40 | bwd_allreduce_microstep: 43.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-29 20:31:12,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.12 | bwd_microstep: 5172.45 | bwd_inner_microstep: 5116.50 | bwd_allreduce_microstep: 55.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-29 20:31:21,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 20:31:21,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.24 | bwd_microstep: 5181.00 | bwd_inner_microstep: 5104.29 | bwd_allreduce_microstep: 76.64 | step_microstep: 180.79 [2024-07-29 20:31:21,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29174.60 | bwd: 41169.72 | bwd_inner: 40076.21 | bwd_allreduce: 1093.03 | step: 181.38 67%|██████▋ | 451/671 [8:48:07<4:15:52, 69.78s/it] {'loss': 1.1736, 'learning_rate': 5.14016586757394e-06, 'epoch': 0.67} 67%|██████▋ | 451/671 [8:48:07<4:15:52, 69.78s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3894 [2024-07-29 20:31:30,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3810.47 | bwd_microstep: 5134.33 | bwd_inner_microstep: 5115.17 | bwd_allreduce_microstep: 19.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3577 [2024-07-29 20:31:39,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.42 | bwd_microstep: 5111.45 | bwd_inner_microstep: 5032.68 | bwd_allreduce_microstep: 78.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-29 20:31:48,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.21 | bwd_microstep: 5217.71 | bwd_inner_microstep: 5134.01 | bwd_allreduce_microstep: 83.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3788 [2024-07-29 20:31:57,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.54 | bwd_microstep: 5031.33 | bwd_inner_microstep: 5010.43 | bwd_allreduce_microstep: 20.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-29 20:32:05,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.66 | bwd_microstep: 5230.94 | bwd_inner_microstep: 4823.85 | bwd_allreduce_microstep: 407.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-29 20:32:14,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.33 | bwd_microstep: 4896.78 | bwd_inner_microstep: 4877.36 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 20:32:22,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.90 | bwd_microstep: 4963.06 | bwd_inner_microstep: 4920.47 | bwd_allreduce_microstep: 42.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2177 [2024-07-29 20:32:31,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 20:32:31,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.44 | bwd_microstep: 5129.21 | bwd_inner_microstep: 4732.29 | bwd_allreduce_microstep: 396.84 | step_microstep: 192.81 [2024-07-29 20:32:31,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29099.87 | bwd: 40714.82 | bwd_inner: 39646.19 | bwd_allreduce: 1068.14 | step: 193.36 67%|██████▋ | 452/671 [8:49:17<4:15:06, 69.89s/it] {'loss': 1.1486, 'learning_rate': 5.097982001719994e-06, 'epoch': 0.67} 67%|██████▋ | 452/671 [8:49:17<4:15:06, 69.89s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2391 [2024-07-29 20:32:40,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.48 | bwd_microstep: 5367.26 | bwd_inner_microstep: 4955.16 | bwd_allreduce_microstep: 412.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2242 [2024-07-29 20:32:49,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.85 | bwd_microstep: 5280.17 | bwd_inner_microstep: 4871.58 | bwd_allreduce_microstep: 408.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3797 [2024-07-29 20:32:58,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.76 | bwd_microstep: 5190.35 | bwd_inner_microstep: 5137.96 | bwd_allreduce_microstep: 52.33 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2087 [2024-07-29 20:33:07,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.73 | bwd_microstep: 5152.85 | bwd_inner_microstep: 4751.14 | bwd_allreduce_microstep: 401.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3748 [2024-07-29 20:33:15,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.31 | bwd_microstep: 4994.71 | bwd_inner_microstep: 4975.40 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-29 20:33:24,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.30 | bwd_microstep: 5179.58 | bwd_inner_microstep: 5107.16 | bwd_allreduce_microstep: 72.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-29 20:33:33,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.03 | bwd_microstep: 5040.57 | bwd_inner_microstep: 5014.32 | bwd_allreduce_microstep: 26.18 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2133 [2024-07-29 20:33:42,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.45 [2024-07-29 20:33:42,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.23 | bwd_microstep: 5165.20 | bwd_inner_microstep: 4763.70 | bwd_allreduce_microstep: 401.44 | step_microstep: 181.81 [2024-07-29 20:33:42,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28929.60 | bwd: 41370.67 | bwd_inner: 39576.36 | bwd_allreduce: 1793.85 | step: 182.37 68%|██████▊ | 453/671 [8:50:28<4:14:45, 70.12s/it] {'loss': 1.2196, 'learning_rate': 5.0559126468346354e-06, 'epoch': 0.67} 68%|██████▊ | 453/671 [8:50:28<4:14:45, 70.12s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2366 [2024-07-29 20:33:51,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.94 | bwd_microstep: 5320.39 | bwd_inner_microstep: 4908.70 | bwd_allreduce_microstep: 411.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2220 [2024-07-29 20:34:00,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.96 | bwd_microstep: 5153.70 | bwd_inner_microstep: 4755.36 | bwd_allreduce_microstep: 398.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2235 [2024-07-29 20:34:08,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.57 | bwd_microstep: 5186.09 | bwd_inner_microstep: 4780.53 | bwd_allreduce_microstep: 405.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-29 20:34:17,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.90 | bwd_microstep: 5228.48 | bwd_inner_microstep: 4822.74 | bwd_allreduce_microstep: 405.67 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-29 20:34:26,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.70 | bwd_microstep: 4960.23 | bwd_inner_microstep: 4940.91 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-29 20:34:34,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3019.09 | bwd_microstep: 4928.92 | bwd_inner_microstep: 4548.98 | bwd_allreduce_microstep: 379.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-07-29 20:34:42,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3020.22 | bwd_microstep: 4897.54 | bwd_inner_microstep: 4520.44 | bwd_allreduce_microstep: 377.04 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-29 20:34:51,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-29 20:34:51,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.09 | bwd_microstep: 5072.93 | bwd_inner_microstep: 5017.47 | bwd_allreduce_microstep: 55.40 | step_microstep: 180.88 [2024-07-29 20:34:51,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27553.37 | bwd: 40748.25 | bwd_inner: 38295.06 | bwd_allreduce: 2452.72 | step: 181.46 68%|██████▊ | 454/671 [8:51:37<4:11:58, 69.67s/it] {'loss': 1.2005, 'learning_rate': 5.013958785656516e-06, 'epoch': 0.68} 68%|██████▊ | 454/671 [8:51:37<4:11:58, 69.67s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3957 [2024-07-29 20:35:00,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3837.49 | bwd_microstep: 5179.98 | bwd_inner_microstep: 5160.83 | bwd_allreduce_microstep: 19.08 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3586 [2024-07-29 20:35:08,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.70 | bwd_microstep: 5081.52 | bwd_inner_microstep: 5014.08 | bwd_allreduce_microstep: 67.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2246 [2024-07-29 20:35:17,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.64 | bwd_microstep: 5171.75 | bwd_inner_microstep: 4769.59 | bwd_allreduce_microstep: 402.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2215 [2024-07-29 20:35:26,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.31 | bwd_microstep: 5230.47 | bwd_inner_microstep: 4822.25 | bwd_allreduce_microstep: 408.15 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-07-29 20:35:34,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3077.84 | bwd_microstep: 5044.67 | bwd_inner_microstep: 4655.15 | bwd_allreduce_microstep: 389.45 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2145 [2024-07-29 20:35:43,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.31 | bwd_microstep: 5197.05 | bwd_inner_microstep: 4795.04 | bwd_allreduce_microstep: 401.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-29 20:35:51,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2989.63 | bwd_microstep: 4868.77 | bwd_inner_microstep: 4494.40 | bwd_allreduce_microstep: 374.31 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3698 [2024-07-29 20:35:59,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-29 20:35:59,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3075.58 | bwd_microstep: 4877.43 | bwd_inner_microstep: 4833.88 | bwd_allreduce_microstep: 43.49 | step_microstep: 181.01 [2024-07-29 20:35:59,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27108.40 | bwd: 40651.61 | bwd_inner: 38545.15 | bwd_allreduce: 2105.98 | step: 181.59 68%|██████▊ | 455/671 [8:52:45<4:09:05, 69.19s/it] {'loss': 1.1786, 'learning_rate': 4.972121398226371e-06, 'epoch': 0.68} 68%|██████▊ | 455/671 [8:52:45<4:09:05, 69.19s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3909 [2024-07-29 20:36:08,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.62 | bwd_microstep: 5412.21 | bwd_inner_microstep: 5348.53 | bwd_allreduce_microstep: 63.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-29 20:36:17,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.82 | bwd_microstep: 5123.48 | bwd_inner_microstep: 5050.27 | bwd_allreduce_microstep: 73.15 | step_microstep: 0.19 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2078 [2024-07-29 20:36:25,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.48 | bwd_microstep: 5244.84 | bwd_inner_microstep: 4838.44 | bwd_allreduce_microstep: 406.34 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-29 20:36:34,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.17 | bwd_microstep: 5178.64 | bwd_inner_microstep: 5095.69 | bwd_allreduce_microstep: 82.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-29 20:36:43,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.58 | bwd_microstep: 4983.14 | bwd_inner_microstep: 4963.69 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-29 20:36:52,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.57 | bwd_microstep: 5096.01 | bwd_inner_microstep: 5026.98 | bwd_allreduce_microstep: 68.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3688 [2024-07-29 20:37:00,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.40 | bwd_microstep: 5006.64 | bwd_inner_microstep: 4939.75 | bwd_allreduce_microstep: 66.83 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2142 [2024-07-29 20:37:09,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 20:37:09,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.91 | bwd_microstep: 5123.19 | bwd_inner_microstep: 4726.62 | bwd_allreduce_microstep: 396.51 | step_microstep: 181.64 [2024-07-29 20:37:09,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28845.43 | bwd: 41168.14 | bwd_inner: 39989.89 | bwd_allreduce: 1177.78 | step: 182.33 68%|██████▊ | 456/671 [8:53:55<4:09:10, 69.54s/it] {'loss': 1.1405, 'learning_rate': 4.930401461864096e-06, 'epoch': 0.68} 68%|██████▊ | 456/671 [8:53:55<4:09:10, 69.54s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3911 [2024-07-29 20:37:18,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3809.31 | bwd_microstep: 5148.85 | bwd_inner_microstep: 5129.71 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3906 [2024-07-29 20:37:27,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.63 | bwd_microstep: 5090.67 | bwd_inner_microstep: 5057.91 | bwd_allreduce_microstep: 32.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3777 [2024-07-29 20:37:36,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3784.61 | bwd_microstep: 5034.50 | bwd_inner_microstep: 5011.02 | bwd_allreduce_microstep: 23.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2234 [2024-07-29 20:37:44,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.37 | bwd_microstep: 5150.59 | bwd_inner_microstep: 4749.14 | bwd_allreduce_microstep: 401.39 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1177 [2024-07-29 20:37:53,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3459.88 | bwd_microstep: 5181.58 | bwd_inner_microstep: 4779.11 | bwd_allreduce_microstep: 402.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-29 20:38:02,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.48 | bwd_microstep: 5048.45 | bwd_inner_microstep: 5022.77 | bwd_allreduce_microstep: 25.62 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3706 [2024-07-29 20:38:10,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.76 | bwd_microstep: 5095.37 | bwd_inner_microstep: 5021.69 | bwd_allreduce_microstep: 73.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 20:38:19,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 20:38:19,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.39 | bwd_microstep: 4986.76 | bwd_inner_microstep: 4935.75 | bwd_allreduce_microstep: 50.94 | step_microstep: 180.34 [2024-07-29 20:38:19,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29096.33 | bwd: 40736.77 | bwd_inner: 39707.04 | bwd_allreduce: 1029.27 | step: 180.92 68%|██████▊ | 457/671 [8:55:05<4:08:41, 69.73s/it] {'loss': 1.15, 'learning_rate': 4.888799951145948e-06, 'epoch': 0.68} 68%|██████▊ | 457/671 [8:55:05<4:08:41, 69.73s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3856 [2024-07-29 20:38:28,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.72 | bwd_microstep: 5228.67 | bwd_inner_microstep: 5188.26 | bwd_allreduce_microstep: 40.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3844 [2024-07-29 20:38:37,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.36 | bwd_microstep: 5105.34 | bwd_inner_microstep: 5085.91 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-29 20:38:46,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.18 | bwd_microstep: 4999.44 | bwd_inner_microstep: 4980.01 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-29 20:38:54,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.82 | bwd_microstep: 5008.24 | bwd_inner_microstep: 4988.91 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2186 [2024-07-29 20:39:03,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.64 | bwd_microstep: 5227.70 | bwd_inner_microstep: 4820.60 | bwd_allreduce_microstep: 407.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-29 20:39:12,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.99 | bwd_microstep: 4998.16 | bwd_inner_microstep: 4941.14 | bwd_allreduce_microstep: 56.96 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-29 20:39:20,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.22 | bwd_microstep: 5101.88 | bwd_inner_microstep: 4708.47 | bwd_allreduce_microstep: 393.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-29 20:39:29,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.44 [2024-07-29 20:39:29,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.00 | bwd_microstep: 5144.53 | bwd_inner_microstep: 5078.31 | bwd_allreduce_microstep: 66.16 | step_microstep: 180.70 [2024-07-29 20:39:29,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29082.84 | bwd: 40813.95 | bwd_inner: 39791.54 | bwd_allreduce: 1021.93 | step: 181.28 68%|██████▊ | 458/671 [8:56:15<4:08:03, 69.88s/it] {'loss': 1.1965, 'learning_rate': 4.847317837881757e-06, 'epoch': 0.68} 68%|██████▊ | 458/671 [8:56:15<4:08:03, 69.88s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3682 [2024-07-29 20:39:38,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.48 | bwd_microstep: 5187.21 | bwd_inner_microstep: 5098.19 | bwd_allreduce_microstep: 88.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3842 [2024-07-29 20:39:47,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.43 | bwd_microstep: 5104.22 | bwd_inner_microstep: 5084.82 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-29 20:39:56,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.31 | bwd_microstep: 5191.63 | bwd_inner_microstep: 5140.88 | bwd_allreduce_microstep: 50.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-07-29 20:40:05,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.30 | bwd_microstep: 5255.33 | bwd_inner_microstep: 4849.13 | bwd_allreduce_microstep: 406.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2185 [2024-07-29 20:40:13,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.51 | bwd_microstep: 5128.45 | bwd_inner_microstep: 4729.56 | bwd_allreduce_microstep: 398.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-29 20:40:22,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.36 | bwd_microstep: 4995.00 | bwd_inner_microstep: 4975.66 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-07-29 20:40:31,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.37 | bwd_microstep: 5043.59 | bwd_inner_microstep: 5002.64 | bwd_allreduce_microstep: 40.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-29 20:40:39,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 20:40:39,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3202.45 | bwd_microstep: 4707.67 | bwd_inner_microstep: 4683.83 | bwd_allreduce_microstep: 23.77 | step_microstep: 181.26 [2024-07-29 20:40:39,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28642.12 | bwd: 40613.10 | bwd_inner: 39564.66 | bwd_allreduce: 1047.96 | step: 181.83 68%|██████▊ | 459/671 [8:57:25<4:06:35, 69.79s/it] {'loss': 1.1939, 'learning_rate': 4.805956091092228e-06, 'epoch': 0.68} 68%|██████▊ | 459/671 [8:57:25<4:06:35, 69.79s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3863 [2024-07-29 20:40:48,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.80 | bwd_microstep: 5213.44 | bwd_inner_microstep: 5173.73 | bwd_allreduce_microstep: 39.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3809 [2024-07-29 20:40:57,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.79 | bwd_microstep: 5097.79 | bwd_inner_microstep: 5067.26 | bwd_allreduce_microstep: 30.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2215 [2024-07-29 20:41:05,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3474.63 | bwd_microstep: 5138.06 | bwd_inner_microstep: 4739.70 | bwd_allreduce_microstep: 398.29 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3706 [2024-07-29 20:41:13,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3086.01 | bwd_microstep: 4706.64 | bwd_inner_microstep: 4679.23 | bwd_allreduce_microstep: 27.34 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-29 20:41:22,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.52 | bwd_microstep: 5024.97 | bwd_inner_microstep: 4969.01 | bwd_allreduce_microstep: 55.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-29 20:41:30,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.14 | bwd_microstep: 4979.96 | bwd_inner_microstep: 4960.59 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3667 [2024-07-29 20:41:39,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.50 | bwd_microstep: 4892.21 | bwd_inner_microstep: 4872.83 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2142 [2024-07-29 20:41:47,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 20:41:47,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3010.57 | bwd_microstep: 4905.73 | bwd_inner_microstep: 4527.02 | bwd_allreduce_microstep: 378.65 | step_microstep: 181.14 [2024-07-29 20:41:47,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27891.86 | bwd: 39958.78 | bwd_inner: 38989.31 | bwd_allreduce: 968.98 | step: 181.72 69%|██████▊ | 460/671 [8:58:33<4:03:43, 69.31s/it] {'loss': 1.0932, 'learning_rate': 4.764715676986327e-06, 'epoch': 0.68} 69%|██████▊ | 460/671 [8:58:33<4:03:43, 69.31s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3983 [2024-07-29 20:41:56,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.87 | bwd_microstep: 5292.66 | bwd_inner_microstep: 5242.73 | bwd_allreduce_microstep: 49.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3821 [2024-07-29 20:42:05,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.29 | bwd_microstep: 5136.39 | bwd_inner_microstep: 5071.10 | bwd_allreduce_microstep: 65.22 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2303 [2024-07-29 20:42:14,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.56 | bwd_microstep: 5201.99 | bwd_inner_microstep: 4797.09 | bwd_allreduce_microstep: 404.84 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2230 [2024-07-29 20:42:22,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.35 | bwd_microstep: 5177.55 | bwd_inner_microstep: 4776.61 | bwd_allreduce_microstep: 400.87 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-29 20:42:31,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.78 | bwd_microstep: 5153.45 | bwd_inner_microstep: 5077.88 | bwd_allreduce_microstep: 75.51 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-29 20:42:40,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3438.72 | bwd_microstep: 5023.61 | bwd_inner_microstep: 4635.69 | bwd_allreduce_microstep: 387.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-29 20:42:48,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.49 | bwd_microstep: 5026.39 | bwd_inner_microstep: 4972.82 | bwd_allreduce_microstep: 53.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-29 20:42:57,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.79 [2024-07-29 20:42:57,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3711.01 | bwd_microstep: 4897.38 | bwd_inner_microstep: 4874.14 | bwd_allreduce_microstep: 23.18 | step_microstep: 181.49 [2024-07-29 20:42:57,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28631.98 | bwd: 40909.40 | bwd_inner: 39447.99 | bwd_allreduce: 1460.94 | step: 182.08 69%|██████▊ | 461/671 [8:59:43<4:03:09, 69.47s/it] {'loss': 1.1657, 'learning_rate': 4.7235975589386715e-06, 'epoch': 0.69} 69%|██████▊ | 461/671 [8:59:43<4:03:09, 69.47s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2347 [2024-07-29 20:43:06,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.39 | bwd_microstep: 5351.58 | bwd_inner_microstep: 4939.72 | bwd_allreduce_microstep: 411.80 | step_microstep: 0.18 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2081 [2024-07-29 20:43:15,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.96 | bwd_microstep: 5247.44 | bwd_inner_microstep: 4840.15 | bwd_allreduce_microstep: 407.23 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2177 [2024-07-29 20:43:24,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.35 | bwd_microstep: 5207.31 | bwd_inner_microstep: 4804.05 | bwd_allreduce_microstep: 403.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-29 20:43:32,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.59 | bwd_microstep: 5064.20 | bwd_inner_microstep: 5035.50 | bwd_allreduce_microstep: 28.63 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-29 20:43:41,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.06 | bwd_microstep: 5160.10 | bwd_inner_microstep: 5085.18 | bwd_allreduce_microstep: 74.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-29 20:43:50,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.39 | bwd_microstep: 5179.35 | bwd_inner_microstep: 5109.02 | bwd_allreduce_microstep: 70.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-29 20:43:58,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3184.02 | bwd_microstep: 4696.41 | bwd_inner_microstep: 4677.06 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-29 20:44:07,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 20:44:07,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.86 | bwd_microstep: 5032.99 | bwd_inner_microstep: 5008.38 | bwd_allreduce_microstep: 24.54 | step_microstep: 180.73 [2024-07-29 20:44:07,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28659.51 | bwd: 40939.37 | bwd_inner: 39499.01 | bwd_allreduce: 1439.88 | step: 181.40 69%|██████▉ | 462/671 [9:00:53<4:02:28, 69.61s/it] {'loss': 1.1878, 'learning_rate': 4.6826026974670665e-06, 'epoch': 0.69} 69%|██████▉ | 462/671 [9:00:53<4:02:28, 69.61s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3983 [2024-07-29 20:44:16,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.76 | bwd_microstep: 5393.02 | bwd_inner_microstep: 5344.78 | bwd_allreduce_microstep: 48.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3761 [2024-07-29 20:44:25,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.20 | bwd_microstep: 5189.69 | bwd_inner_microstep: 5133.69 | bwd_allreduce_microstep: 55.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-29 20:44:34,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.34 | bwd_microstep: 5159.27 | bwd_inner_microstep: 5104.24 | bwd_allreduce_microstep: 54.97 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2184 [2024-07-29 20:44:42,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.22 | bwd_microstep: 5191.36 | bwd_inner_microstep: 4785.55 | bwd_allreduce_microstep: 405.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-29 20:44:51,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3241.37 | bwd_microstep: 4785.38 | bwd_inner_microstep: 4766.08 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-29 20:44:59,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.92 | bwd_microstep: 4886.47 | bwd_inner_microstep: 4866.30 | bwd_allreduce_microstep: 20.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-29 20:45:08,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.11 | bwd_microstep: 4931.59 | bwd_inner_microstep: 4901.84 | bwd_allreduce_microstep: 29.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-29 20:45:16,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 20:45:16,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.65 | bwd_microstep: 5119.32 | bwd_inner_microstep: 4721.40 | bwd_allreduce_microstep: 397.86 | step_microstep: 180.64 [2024-07-29 20:45:16,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28497.48 | bwd: 40656.08 | bwd_inner: 39623.80 | bwd_allreduce: 1031.81 | step: 181.21 69%|██████▉ | 463/671 [9:02:02<4:01:11, 69.57s/it] {'loss': 1.1552, 'learning_rate': 4.641732050210036e-06, 'epoch': 0.69} 69%|██████▉ | 463/671 [9:02:02<4:01:11, 69.57s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2371 [2024-07-29 20:45:26,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.34 | bwd_microstep: 5412.64 | bwd_inner_microstep: 4997.61 | bwd_allreduce_microstep: 414.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3806 [2024-07-29 20:45:34,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.26 | bwd_microstep: 5016.69 | bwd_inner_microstep: 4997.06 | bwd_allreduce_microstep: 19.54 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-29 20:45:42,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.95 | bwd_microstep: 4838.74 | bwd_inner_microstep: 4791.66 | bwd_allreduce_microstep: 47.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-29 20:45:51,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.18 | bwd_microstep: 5219.69 | bwd_inner_microstep: 5136.19 | bwd_allreduce_microstep: 83.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-29 20:46:00,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.39 | bwd_microstep: 5084.00 | bwd_inner_microstep: 5021.51 | bwd_allreduce_microstep: 62.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-29 20:46:09,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.72 | bwd_microstep: 4885.00 | bwd_inner_microstep: 4865.60 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-29 20:46:17,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.98 | bwd_microstep: 4949.53 | bwd_inner_microstep: 4906.36 | bwd_allreduce_microstep: 43.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3694 [2024-07-29 20:46:26,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.90 [2024-07-29 20:46:26,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.18 | bwd_microstep: 4927.54 | bwd_inner_microstep: 4903.21 | bwd_allreduce_microstep: 24.27 | step_microstep: 180.95 [2024-07-29 20:46:26,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28729.89 | bwd: 40333.83 | bwd_inner: 39619.14 | bwd_allreduce: 714.21 | step: 181.55 69%|██████▉ | 464/671 [9:03:12<3:59:51, 69.52s/it] {'loss': 1.0927, 'learning_rate': 4.6009865719044645e-06, 'epoch': 0.69} 69%|██████▉ | 464/671 [9:03:12<3:59:51, 69.52s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3993 [2024-07-29 20:46:35,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3864.30 | bwd_microstep: 5240.43 | bwd_inner_microstep: 5221.26 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-29 20:46:44,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.19 | bwd_microstep: 4993.61 | bwd_inner_microstep: 4974.21 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-07-29 20:46:53,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.03 | bwd_microstep: 5177.16 | bwd_inner_microstep: 5128.26 | bwd_allreduce_microstep: 48.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-29 20:47:01,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.51 | bwd_microstep: 5175.82 | bwd_inner_microstep: 5118.47 | bwd_allreduce_microstep: 57.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-29 20:47:10,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.93 | bwd_microstep: 4993.18 | bwd_inner_microstep: 4973.83 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3643 [2024-07-29 20:47:19,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.40 | bwd_microstep: 5027.15 | bwd_inner_microstep: 4951.13 | bwd_allreduce_microstep: 75.95 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2117 [2024-07-29 20:47:27,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3017.22 | bwd_microstep: 4909.86 | bwd_inner_microstep: 4532.57 | bwd_allreduce_microstep: 377.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 20:47:36,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 20:47:36,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3859.47 | bwd_microstep: 4907.51 | bwd_inner_microstep: 4845.09 | bwd_allreduce_microstep: 62.35 | step_microstep: 180.66 [2024-07-29 20:47:36,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28991.95 | bwd: 40424.70 | bwd_inner: 39744.76 | bwd_allreduce: 679.47 | step: 181.22 69%|██████▉ | 465/671 [9:04:22<3:58:55, 69.59s/it] {'loss': 1.0978, 'learning_rate': 4.560367214363295e-06, 'epoch': 0.69} 69%|██████▉ | 465/671 [9:04:22<3:58:55, 69.59s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3888 [2024-07-29 20:47:45,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3788.88 | bwd_microstep: 5124.83 | bwd_inner_microstep: 5105.63 | bwd_allreduce_microstep: 19.13 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3602 [2024-07-29 20:47:53,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3113.64 | bwd_microstep: 5006.69 | bwd_inner_microstep: 4942.50 | bwd_allreduce_microstep: 64.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3784 [2024-07-29 20:48:02,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.09 | bwd_microstep: 5088.77 | bwd_inner_microstep: 5064.93 | bwd_allreduce_microstep: 23.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3604 [2024-07-29 20:48:10,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.55 | bwd_microstep: 5045.88 | bwd_inner_microstep: 4982.52 | bwd_allreduce_microstep: 63.30 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2072 [2024-07-29 20:48:19,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.85 | bwd_microstep: 5202.67 | bwd_inner_microstep: 4797.58 | bwd_allreduce_microstep: 405.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3776 [2024-07-29 20:48:28,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.24 | bwd_microstep: 5189.26 | bwd_inner_microstep: 5133.34 | bwd_allreduce_microstep: 55.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-29 20:48:36,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.02 | bwd_microstep: 5093.23 | bwd_inner_microstep: 5043.28 | bwd_allreduce_microstep: 49.89 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-29 20:48:45,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 20:48:45,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.73 | bwd_microstep: 5148.18 | bwd_inner_microstep: 4749.04 | bwd_allreduce_microstep: 399.08 | step_microstep: 180.89 [2024-07-29 20:48:45,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28476.90 | bwd: 40899.49 | bwd_inner: 39818.75 | bwd_allreduce: 1080.27 | step: 181.45 69%|██████▉ | 466/671 [9:05:31<3:57:52, 69.62s/it] {'loss': 1.1518, 'learning_rate': 4.519874926453303e-06, 'epoch': 0.69} 69%|██████▉ | 466/671 [9:05:31<3:57:52, 69.62s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2353 [2024-07-29 20:48:54,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.96 | bwd_microstep: 5530.41 | bwd_inner_microstep: 5123.82 | bwd_allreduce_microstep: 406.52 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2337 [2024-07-29 20:49:02,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3019.14 | bwd_microstep: 4846.18 | bwd_inner_microstep: 4473.51 | bwd_allreduce_microstep: 372.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-29 20:49:11,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.52 | bwd_microstep: 5172.25 | bwd_inner_microstep: 5094.33 | bwd_allreduce_microstep: 77.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-29 20:49:20,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.64 | bwd_microstep: 5104.35 | bwd_inner_microstep: 4707.25 | bwd_allreduce_microstep: 397.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-29 20:49:28,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3193.01 | bwd_microstep: 4727.22 | bwd_inner_microstep: 4696.10 | bwd_allreduce_microstep: 31.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-29 20:49:36,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.01 | bwd_microstep: 5080.89 | bwd_inner_microstep: 4686.73 | bwd_allreduce_microstep: 394.09 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2123 [2024-07-29 20:49:45,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.81 | bwd_microstep: 5042.16 | bwd_inner_microstep: 4651.02 | bwd_allreduce_microstep: 391.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-29 20:49:54,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-29 20:49:54,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.47 | bwd_microstep: 4946.13 | bwd_inner_microstep: 4916.40 | bwd_allreduce_microstep: 29.67 | step_microstep: 181.44 [2024-07-29 20:49:54,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27556.46 | bwd: 40449.56 | bwd_inner: 38349.10 | bwd_allreduce: 2100.00 | step: 182.00 70%|██████▉ | 467/671 [9:06:40<3:55:23, 69.24s/it] {'loss': 1.0652, 'learning_rate': 4.479510654072905e-06, 'epoch': 0.7} 70%|██████▉ | 467/671 [9:06:40<3:55:23, 69.24s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2256 [2024-07-29 20:50:03,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.64 | bwd_microstep: 5351.02 | bwd_inner_microstep: 4939.83 | bwd_allreduce_microstep: 411.13 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3594 [2024-07-29 20:50:12,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.25 | bwd_microstep: 5294.01 | bwd_inner_microstep: 5173.52 | bwd_allreduce_microstep: 120.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-29 20:50:20,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.68 | bwd_microstep: 5146.65 | bwd_inner_microstep: 5073.75 | bwd_allreduce_microstep: 72.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-29 20:50:29,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.36 | bwd_microstep: 5203.14 | bwd_inner_microstep: 5117.61 | bwd_allreduce_microstep: 85.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-29 20:50:38,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.03 | bwd_microstep: 5167.94 | bwd_inner_microstep: 5087.28 | bwd_allreduce_microstep: 80.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-29 20:50:47,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.77 | bwd_microstep: 5158.15 | bwd_inner_microstep: 5085.25 | bwd_allreduce_microstep: 72.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3764 [2024-07-29 20:50:56,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.30 | bwd_microstep: 5104.72 | bwd_inner_microstep: 5063.10 | bwd_allreduce_microstep: 41.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-29 20:51:04,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 20:51:04,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3418.11 | bwd_microstep: 4992.72 | bwd_inner_microstep: 4948.58 | bwd_allreduce_microstep: 44.07 | step_microstep: 182.97 [2024-07-29 20:51:04,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28744.04 | bwd: 41418.33 | bwd_inner: 40488.85 | bwd_allreduce: 929.00 | step: 183.54 70%|██████▉ | 468/671 [9:07:50<3:55:31, 69.61s/it] {'loss': 1.1441, 'learning_rate': 4.439275340130099e-06, 'epoch': 0.7} 70%|██████▉ | 468/671 [9:07:50<3:55:31, 69.61s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-29 20:51:13,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.88 | bwd_microstep: 5483.13 | bwd_inner_microstep: 5394.80 | bwd_allreduce_microstep: 88.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3804 [2024-07-29 20:51:22,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.03 | bwd_microstep: 5036.82 | bwd_inner_microstep: 5017.37 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-29 20:51:31,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.73 | bwd_microstep: 5016.06 | bwd_inner_microstep: 4996.00 | bwd_allreduce_microstep: 19.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-29 20:51:40,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.04 | bwd_microstep: 5104.02 | bwd_inner_microstep: 5034.13 | bwd_allreduce_microstep: 69.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3734 [2024-07-29 20:51:48,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.94 | bwd_microstep: 5153.37 | bwd_inner_microstep: 5100.52 | bwd_allreduce_microstep: 52.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-29 20:51:56,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3183.48 | bwd_microstep: 4692.89 | bwd_inner_microstep: 4673.55 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3711 [2024-07-29 20:52:05,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.87 | bwd_microstep: 5136.98 | bwd_inner_microstep: 5048.09 | bwd_allreduce_microstep: 88.82 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2148 [2024-07-29 20:52:14,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 20:52:14,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.81 | bwd_microstep: 5125.94 | bwd_inner_microstep: 4728.81 | bwd_allreduce_microstep: 397.06 | step_microstep: 181.68 [2024-07-29 20:52:14,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28587.70 | bwd: 40749.19 | bwd_inner: 39993.22 | bwd_allreduce: 755.49 | step: 182.25 70%|██████▉ | 469/671 [9:09:00<3:54:25, 69.63s/it] {'loss': 1.0733, 'learning_rate': 4.399169924520403e-06, 'epoch': 0.7} 70%|██████▉ | 469/671 [9:09:00<3:54:25, 69.63s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3546 [2024-07-29 20:52:23,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.33 | bwd_microstep: 5235.65 | bwd_inner_microstep: 5142.75 | bwd_allreduce_microstep: 92.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3834 [2024-07-29 20:52:32,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.78 | bwd_microstep: 5151.09 | bwd_inner_microstep: 5107.63 | bwd_allreduce_microstep: 43.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3773 [2024-07-29 20:52:40,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.81 | bwd_microstep: 4829.52 | bwd_inner_microstep: 4810.18 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.07 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2076 [2024-07-29 20:52:49,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.10 | bwd_microstep: 5268.55 | bwd_inner_microstep: 4861.28 | bwd_allreduce_microstep: 407.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-29 20:52:57,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.89 | bwd_microstep: 5257.95 | bwd_inner_microstep: 4850.84 | bwd_allreduce_microstep: 407.04 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-29 20:53:06,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.34 | bwd_microstep: 5001.23 | bwd_inner_microstep: 4978.93 | bwd_allreduce_microstep: 22.24 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2138 [2024-07-29 20:53:15,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.28 | bwd_microstep: 5206.85 | bwd_inner_microstep: 4802.41 | bwd_allreduce_microstep: 404.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2138 [2024-07-29 20:53:24,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 20:53:24,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.44 | bwd_microstep: 5112.05 | bwd_inner_microstep: 4714.52 | bwd_allreduce_microstep: 397.46 | step_microstep: 543.26 [2024-07-29 20:53:24,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28660.87 | bwd: 41062.87 | bwd_inner: 39268.47 | bwd_allreduce: 1793.92 | step: 543.83 70%|███████ | 470/671 [9:10:10<3:54:02, 69.86s/it] {'loss': 1.1216, 'learning_rate': 4.359195344104916e-06, 'epoch': 0.7} 70%|███████ | 470/671 [9:10:10<3:54:02, 69.86s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2431 [2024-07-29 20:53:32,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3106.57 | bwd_microstep: 5137.26 | bwd_inner_microstep: 4744.34 | bwd_allreduce_microstep: 392.86 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2021 [2024-07-29 20:53:41,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3077.73 | bwd_microstep: 5171.24 | bwd_inner_microstep: 4773.14 | bwd_allreduce_microstep: 398.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-29 20:53:49,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.30 | bwd_microstep: 5112.12 | bwd_inner_microstep: 5039.63 | bwd_allreduce_microstep: 72.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-29 20:53:58,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.25 | bwd_microstep: 4826.78 | bwd_inner_microstep: 4805.24 | bwd_allreduce_microstep: 21.48 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3713 [2024-07-29 20:54:06,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3271.19 | bwd_microstep: 4939.03 | bwd_inner_microstep: 4898.86 | bwd_allreduce_microstep: 40.10 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2120 [2024-07-29 20:54:15,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.44 | bwd_microstep: 5218.07 | bwd_inner_microstep: 4811.76 | bwd_allreduce_microstep: 406.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-29 20:54:23,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.38 | bwd_microstep: 5226.14 | bwd_inner_microstep: 4818.96 | bwd_allreduce_microstep: 407.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-29 20:54:32,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.42 [2024-07-29 20:54:32,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.20 | bwd_microstep: 5022.35 | bwd_inner_microstep: 4969.13 | bwd_allreduce_microstep: 53.15 | step_microstep: 539.76 [2024-07-29 20:54:32,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 26937.95 | bwd: 40652.97 | bwd_inner: 38860.99 | bwd_allreduce: 1791.50 | step: 540.32 70%|███████ | 471/671 [9:11:18<3:51:17, 69.39s/it] {'loss': 1.1248, 'learning_rate': 4.319352532688444e-06, 'epoch': 0.7} 70%|███████ | 471/671 [9:11:18<3:51:17, 69.39s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3952 [2024-07-29 20:54:42,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.06 | bwd_microstep: 5359.18 | bwd_inner_microstep: 5302.14 | bwd_allreduce_microstep: 56.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3826 [2024-07-29 20:54:50,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.42 | bwd_microstep: 5053.97 | bwd_inner_microstep: 5034.62 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3595 [2024-07-29 20:54:59,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3469.51 | bwd_microstep: 5162.74 | bwd_inner_microstep: 5085.27 | bwd_allreduce_microstep: 77.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3799 [2024-07-29 20:55:07,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3256.46 | bwd_microstep: 4834.73 | bwd_inner_microstep: 4815.35 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3739 [2024-07-29 20:55:16,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.42 | bwd_microstep: 5123.71 | bwd_inner_microstep: 5047.73 | bwd_allreduce_microstep: 75.92 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-29 20:55:25,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.93 | bwd_microstep: 5059.62 | bwd_inner_microstep: 5000.37 | bwd_allreduce_microstep: 59.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-29 20:55:33,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.16 | bwd_microstep: 5184.20 | bwd_inner_microstep: 5110.11 | bwd_allreduce_microstep: 74.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-29 20:55:42,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 20:55:42,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.88 | bwd_microstep: 4992.56 | bwd_inner_microstep: 4939.80 | bwd_allreduce_microstep: 52.70 | step_microstep: 181.19 [2024-07-29 20:55:42,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28503.75 | bwd: 40770.70 | bwd_inner: 40335.33 | bwd_allreduce: 434.89 | step: 181.74 70%|███████ | 472/671 [9:12:28<3:50:21, 69.45s/it] {'loss': 1.1487, 'learning_rate': 4.279642420997655e-06, 'epoch': 0.7} 70%|███████ | 472/671 [9:12:28<3:50:21, 69.45s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2398 [2024-07-29 20:55:51,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.74 | bwd_microstep: 5217.45 | bwd_inner_microstep: 4818.14 | bwd_allreduce_microstep: 399.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3846 [2024-07-29 20:56:00,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3653.98 | bwd_microstep: 5070.14 | bwd_inner_microstep: 5035.79 | bwd_allreduce_microstep: 34.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2266 [2024-07-29 20:56:08,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.64 | bwd_microstep: 5155.57 | bwd_inner_microstep: 4754.83 | bwd_allreduce_microstep: 400.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-29 20:56:17,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.49 | bwd_microstep: 5136.80 | bwd_inner_microstep: 5060.97 | bwd_allreduce_microstep: 75.76 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-29 20:56:26,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.67 | bwd_microstep: 5063.81 | bwd_inner_microstep: 4669.83 | bwd_allreduce_microstep: 393.91 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-29 20:56:34,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.82 | bwd_microstep: 4995.18 | bwd_inner_microstep: 4975.80 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-29 20:56:43,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.57 | bwd_microstep: 5188.53 | bwd_inner_microstep: 5114.04 | bwd_allreduce_microstep: 74.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-29 20:56:52,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 20:56:52,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.38 | bwd_microstep: 5020.33 | bwd_inner_microstep: 4966.92 | bwd_allreduce_microstep: 53.34 | step_microstep: 181.86 [2024-07-29 20:56:52,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28723.20 | bwd: 40847.78 | bwd_inner: 39396.27 | bwd_allreduce: 1451.05 | step: 182.41 70%|███████ | 473/671 [9:13:38<3:49:38, 69.59s/it] {'loss': 1.1352, 'learning_rate': 4.240065936659374e-06, 'epoch': 0.7} 70%|███████ | 473/671 [9:13:38<3:49:38, 69.59s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2396 [2024-07-29 20:57:01,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.46 | bwd_microstep: 5351.96 | bwd_inner_microstep: 4934.19 | bwd_allreduce_microstep: 417.70 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3838 [2024-07-29 20:57:10,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.81 | bwd_microstep: 5225.38 | bwd_inner_microstep: 5157.41 | bwd_allreduce_microstep: 67.91 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3869 [2024-07-29 20:57:19,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3794.01 | bwd_microstep: 5114.28 | bwd_inner_microstep: 5094.88 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-29 20:57:28,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3770.76 | bwd_microstep: 5024.25 | bwd_inner_microstep: 5002.82 | bwd_allreduce_microstep: 21.37 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2113 [2024-07-29 20:57:36,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.64 | bwd_microstep: 5234.12 | bwd_inner_microstep: 4828.11 | bwd_allreduce_microstep: 405.95 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-29 20:57:44,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3209.70 | bwd_microstep: 4731.88 | bwd_inner_microstep: 4707.16 | bwd_allreduce_microstep: 24.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-29 20:57:53,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.44 | bwd_microstep: 5290.96 | bwd_inner_microstep: 5181.36 | bwd_allreduce_microstep: 109.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-29 20:58:02,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-29 20:58:02,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.67 | bwd_microstep: 4981.73 | bwd_inner_microstep: 4933.02 | bwd_allreduce_microstep: 48.65 | step_microstep: 180.95 [2024-07-29 20:58:02,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28760.37 | bwd: 40954.54 | bwd_inner: 39838.89 | bwd_allreduce: 1115.19 | step: 181.52 71%|███████ | 474/671 [9:14:48<3:48:55, 69.72s/it] {'loss': 1.1473, 'learning_rate': 4.200624004178886e-06, 'epoch': 0.71} 71%|███████ | 474/671 [9:14:48<3:48:55, 69.72s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3878 [2024-07-29 20:58:11,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.28 | bwd_microstep: 5364.32 | bwd_inner_microstep: 5297.40 | bwd_allreduce_microstep: 66.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3785 [2024-07-29 20:58:19,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.15 | bwd_microstep: 4832.72 | bwd_inner_microstep: 4813.28 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3759 [2024-07-29 20:58:28,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.15 | bwd_microstep: 5080.38 | bwd_inner_microstep: 5054.06 | bwd_allreduce_microstep: 26.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-29 20:58:37,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.36 | bwd_microstep: 5021.35 | bwd_inner_microstep: 4999.12 | bwd_allreduce_microstep: 22.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-29 20:58:46,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.73 | bwd_microstep: 5252.26 | bwd_inner_microstep: 4844.77 | bwd_allreduce_microstep: 407.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-29 20:58:54,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.95 | bwd_microstep: 5119.31 | bwd_inner_microstep: 5056.20 | bwd_allreduce_microstep: 63.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-29 20:59:03,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.47 | bwd_microstep: 5090.71 | bwd_inner_microstep: 5027.80 | bwd_allreduce_microstep: 62.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-29 20:59:12,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.75 [2024-07-29 20:59:12,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.64 | bwd_microstep: 4926.37 | bwd_inner_microstep: 4907.04 | bwd_allreduce_microstep: 19.26 | step_microstep: 180.94 [2024-07-29 20:59:12,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28856.62 | bwd: 40687.39 | bwd_inner: 39999.63 | bwd_allreduce: 687.30 | step: 181.51 71%|███████ | 475/671 [9:15:58<3:47:55, 69.77s/it] {'loss': 1.1667, 'learning_rate': 4.1613175449183484e-06, 'epoch': 0.71} 71%|███████ | 475/671 [9:15:58<3:47:55, 69.77s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-29 20:59:21,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.07 | bwd_microstep: 5409.13 | bwd_inner_microstep: 5307.11 | bwd_allreduce_microstep: 101.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3794 [2024-07-29 20:59:29,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3262.30 | bwd_microstep: 4896.58 | bwd_inner_microstep: 4869.24 | bwd_allreduce_microstep: 27.27 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-29 20:59:38,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.07 | bwd_microstep: 5175.08 | bwd_inner_microstep: 5090.33 | bwd_allreduce_microstep: 84.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-29 20:59:47,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.40 | bwd_microstep: 5059.86 | bwd_inner_microstep: 5033.79 | bwd_allreduce_microstep: 26.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 20:59:55,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.96 | bwd_microstep: 4990.51 | bwd_inner_microstep: 4943.25 | bwd_allreduce_microstep: 47.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-29 21:00:03,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3182.11 | bwd_microstep: 4728.38 | bwd_inner_microstep: 4702.98 | bwd_allreduce_microstep: 25.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-29 21:00:12,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.32 | bwd_microstep: 5186.72 | bwd_inner_microstep: 5107.69 | bwd_allreduce_microstep: 78.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-29 21:00:21,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 21:00:21,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.20 | bwd_microstep: 5001.24 | bwd_inner_microstep: 4981.89 | bwd_allreduce_microstep: 19.28 | step_microstep: 182.21 [2024-07-29 21:00:21,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28387.33 | bwd: 40447.49 | bwd_inner: 40036.21 | bwd_allreduce: 410.81 | step: 182.79 71%|███████ | 476/671 [9:17:07<3:46:10, 69.59s/it] {'loss': 1.1294, 'learning_rate': 4.12214747707527e-06, 'epoch': 0.71} 71%|███████ | 476/671 [9:17:07<3:46:10, 69.59s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2346 [2024-07-29 21:00:30,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.78 | bwd_microstep: 5190.53 | bwd_inner_microstep: 4790.18 | bwd_allreduce_microstep: 400.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3805 [2024-07-29 21:00:39,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.04 | bwd_microstep: 5021.51 | bwd_inner_microstep: 5002.22 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3767 [2024-07-29 21:00:47,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.01 | bwd_microstep: 5228.78 | bwd_inner_microstep: 5166.60 | bwd_allreduce_microstep: 62.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-29 21:00:56,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.25 | bwd_microstep: 5136.12 | bwd_inner_microstep: 4737.30 | bwd_allreduce_microstep: 398.75 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-29 21:01:05,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.94 | bwd_microstep: 5191.50 | bwd_inner_microstep: 5132.35 | bwd_allreduce_microstep: 59.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-29 21:01:14,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.45 | bwd_microstep: 5047.53 | bwd_inner_microstep: 4991.04 | bwd_allreduce_microstep: 56.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-29 21:01:22,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.48 | bwd_microstep: 5170.17 | bwd_inner_microstep: 4769.77 | bwd_allreduce_microstep: 400.34 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2134 [2024-07-29 21:01:31,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 21:01:31,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.10 | bwd_microstep: 5200.72 | bwd_inner_microstep: 4796.32 | bwd_allreduce_microstep: 404.33 | step_microstep: 181.21 [2024-07-29 21:01:31,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28604.95 | bwd: 41186.84 | bwd_inner: 39385.71 | bwd_allreduce: 1800.66 | step: 181.79 71%|███████ | 477/671 [9:18:17<3:45:31, 69.75s/it] {'loss': 1.1839, 'learning_rate': 4.083114715661069e-06, 'epoch': 0.71} 71%|███████ | 477/671 [9:18:17<3:45:31, 69.75s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3845 [2024-07-29 21:01:40,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3647.21 | bwd_microstep: 5082.07 | bwd_inner_microstep: 5046.78 | bwd_allreduce_microstep: 35.22 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3779 [2024-07-29 21:01:49,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.25 | bwd_microstep: 5032.04 | bwd_inner_microstep: 5010.69 | bwd_allreduce_microstep: 21.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-29 21:01:58,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.72 | bwd_microstep: 5179.64 | bwd_inner_microstep: 5122.50 | bwd_allreduce_microstep: 57.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 21:02:06,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.85 | bwd_microstep: 5139.43 | bwd_inner_microstep: 4741.08 | bwd_allreduce_microstep: 398.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-29 21:02:14,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3180.70 | bwd_microstep: 4682.73 | bwd_inner_microstep: 4662.12 | bwd_allreduce_microstep: 20.55 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3699 [2024-07-29 21:02:23,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.20 | bwd_microstep: 5054.17 | bwd_inner_microstep: 4996.77 | bwd_allreduce_microstep: 57.33 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3744 [2024-07-29 21:02:31,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3143.61 | bwd_microstep: 4805.61 | bwd_inner_microstep: 4781.85 | bwd_allreduce_microstep: 23.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2129 [2024-07-29 21:02:39,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 21:02:39,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.68 | bwd_microstep: 5035.88 | bwd_inner_microstep: 4645.80 | bwd_allreduce_microstep: 390.02 | step_microstep: 181.85 [2024-07-29 21:02:39,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27926.14 | bwd: 40011.56 | bwd_inner: 39007.54 | bwd_allreduce: 1003.55 | step: 182.51 71%|███████ | 478/671 [9:19:25<3:42:56, 69.31s/it] {'loss': 1.0997, 'learning_rate': 4.044220172479675e-06, 'epoch': 0.71} 71%|███████ | 478/671 [9:19:25<3:42:56, 69.31s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2312 [2024-07-29 21:02:48,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.79 | bwd_microstep: 5201.58 | bwd_inner_microstep: 4800.24 | bwd_allreduce_microstep: 401.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3823 [2024-07-29 21:02:57,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.02 | bwd_microstep: 5194.03 | bwd_inner_microstep: 5144.28 | bwd_allreduce_microstep: 49.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3779 [2024-07-29 21:03:06,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3780.13 | bwd_microstep: 5037.15 | bwd_inner_microstep: 5014.74 | bwd_allreduce_microstep: 22.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-29 21:03:15,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.28 | bwd_microstep: 5092.03 | bwd_inner_microstep: 5028.35 | bwd_allreduce_microstep: 63.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-29 21:03:23,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.60 | bwd_microstep: 5170.87 | bwd_inner_microstep: 5103.00 | bwd_allreduce_microstep: 67.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-29 21:03:32,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.42 | bwd_microstep: 5112.78 | bwd_inner_microstep: 5045.33 | bwd_allreduce_microstep: 67.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-29 21:03:41,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.24 | bwd_microstep: 5041.13 | bwd_inner_microstep: 4986.01 | bwd_allreduce_microstep: 55.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3693 [2024-07-29 21:03:50,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 21:03:50,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.04 | bwd_microstep: 5162.38 | bwd_inner_microstep: 5077.56 | bwd_allreduce_microstep: 84.75 | step_microstep: 181.88 [2024-07-29 21:03:50,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28852.42 | bwd: 41011.92 | bwd_inner: 40199.45 | bwd_allreduce: 812.01 | step: 182.44 71%|███████▏ | 479/671 [9:20:36<3:42:37, 69.57s/it] {'loss': 1.0926, 'learning_rate': 4.0054647561062625e-06, 'epoch': 0.71} 71%|███████▏ | 479/671 [9:20:36<3:42:37, 69.57s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3903 [2024-07-29 21:03:59,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3800.04 | bwd_microstep: 5203.93 | bwd_inner_microstep: 5179.94 | bwd_allreduce_microstep: 23.92 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 732 [2024-07-29 21:04:08,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.49 | bwd_microstep: 5349.21 | bwd_inner_microstep: 4939.13 | bwd_allreduce_microstep: 410.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2263 [2024-07-29 21:04:16,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.20 | bwd_microstep: 5230.78 | bwd_inner_microstep: 4824.15 | bwd_allreduce_microstep: 406.55 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-29 21:04:25,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.72 | bwd_microstep: 5252.58 | bwd_inner_microstep: 4845.36 | bwd_allreduce_microstep: 407.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3792 [2024-07-29 21:04:33,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.57 | bwd_microstep: 4830.24 | bwd_inner_microstep: 4809.60 | bwd_allreduce_microstep: 20.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-29 21:04:42,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.25 | bwd_microstep: 5147.79 | bwd_inner_microstep: 5074.61 | bwd_allreduce_microstep: 73.12 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-29 21:04:51,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.11 | bwd_microstep: 4984.70 | bwd_inner_microstep: 4965.37 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-29 21:05:00,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 21:05:00,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.20 | bwd_microstep: 5007.57 | bwd_inner_microstep: 4953.71 | bwd_allreduce_microstep: 53.80 | step_microstep: 181.14 [2024-07-29 21:05:00,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28591.50 | bwd: 41006.79 | bwd_inner: 39591.80 | bwd_allreduce: 1414.50 | step: 181.81 72%|███████▏ | 480/671 [9:21:46<3:41:48, 69.68s/it] {'loss': 1.1799, 'learning_rate': 3.9668493718659924e-06, 'epoch': 0.71} 72%|███████▏ | 480/671 [9:21:46<3:41:48, 69.68s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3832 [2024-07-29 21:05:09,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3804.25 | bwd_microstep: 5227.09 | bwd_inner_microstep: 5187.92 | bwd_allreduce_microstep: 39.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-29 21:05:17,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3239.29 | bwd_microstep: 4925.12 | bwd_inner_microstep: 4871.73 | bwd_allreduce_microstep: 53.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3832 [2024-07-29 21:05:26,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3769.06 | bwd_microstep: 5049.76 | bwd_inner_microstep: 5030.40 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-29 21:05:34,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.59 | bwd_microstep: 5006.26 | bwd_inner_microstep: 4986.90 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-29 21:05:43,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.15 | bwd_microstep: 4999.46 | bwd_inner_microstep: 4980.09 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3776 [2024-07-29 21:05:51,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.25 | bwd_microstep: 4817.95 | bwd_inner_microstep: 4798.50 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-29 21:06:00,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.42 | bwd_microstep: 5156.55 | bwd_inner_microstep: 5086.56 | bwd_allreduce_microstep: 69.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 21:06:09,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 21:06:09,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.69 | bwd_microstep: 5077.25 | bwd_inner_microstep: 5015.98 | bwd_allreduce_microstep: 61.21 | step_microstep: 183.04 [2024-07-29 21:06:09,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28719.61 | bwd: 40259.44 | bwd_inner: 39958.01 | bwd_allreduce: 300.94 | step: 183.62 72%|███████▏ | 481/671 [9:22:55<3:40:18, 69.57s/it] {'loss': 1.1347, 'learning_rate': 3.9283749218128885e-06, 'epoch': 0.72} 72%|███████▏ | 481/671 [9:22:55<3:40:18, 69.57s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2389 [2024-07-29 21:06:18,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.11 | bwd_microstep: 5277.49 | bwd_inner_microstep: 4873.94 | bwd_allreduce_microstep: 403.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3883 [2024-07-29 21:06:27,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3772.71 | bwd_microstep: 5124.86 | bwd_inner_microstep: 5105.47 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-29 21:06:36,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.36 | bwd_microstep: 5236.55 | bwd_inner_microstep: 5151.70 | bwd_allreduce_microstep: 84.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-29 21:06:44,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.36 | bwd_microstep: 5115.11 | bwd_inner_microstep: 5069.09 | bwd_allreduce_microstep: 45.95 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2111 [2024-07-29 21:06:53,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.63 | bwd_microstep: 5224.09 | bwd_inner_microstep: 4819.29 | bwd_allreduce_microstep: 404.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-29 21:07:02,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.75 | bwd_microstep: 5191.97 | bwd_inner_microstep: 5119.48 | bwd_allreduce_microstep: 72.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-29 21:07:11,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.51 | bwd_microstep: 5180.87 | bwd_inner_microstep: 5125.73 | bwd_allreduce_microstep: 55.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-29 21:07:20,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-29 21:07:20,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.09 | bwd_microstep: 5056.86 | bwd_inner_microstep: 5018.33 | bwd_allreduce_microstep: 38.47 | step_microstep: 181.73 [2024-07-29 21:07:20,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29084.42 | bwd: 41407.80 | bwd_inner: 40282.97 | bwd_allreduce: 1124.37 | step: 182.30 72%|███████▏ | 482/671 [9:24:06<3:40:19, 69.95s/it] {'loss': 1.1829, 'learning_rate': 3.890042304708758e-06, 'epoch': 0.72} 72%|███████▏ | 482/671 [9:24:06<3:40:19, 69.95s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4028 [2024-07-29 21:07:29,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.63 | bwd_microstep: 5366.22 | bwd_inner_microstep: 5323.54 | bwd_allreduce_microstep: 42.62 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3068 [2024-07-29 21:07:38,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.49 | bwd_microstep: 5247.76 | bwd_inner_microstep: 4873.92 | bwd_allreduce_microstep: 373.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3787 [2024-07-29 21:07:46,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.82 | bwd_microstep: 5022.52 | bwd_inner_microstep: 5003.06 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2076 [2024-07-29 21:07:55,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3511.38 | bwd_microstep: 5162.57 | bwd_inner_microstep: 4762.48 | bwd_allreduce_microstep: 400.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-29 21:08:04,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.99 | bwd_microstep: 5026.78 | bwd_inner_microstep: 5005.20 | bwd_allreduce_microstep: 21.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-29 21:08:13,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.14 | bwd_microstep: 5038.94 | bwd_inner_microstep: 4977.95 | bwd_allreduce_microstep: 60.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-29 21:08:21,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.77 | bwd_microstep: 5065.79 | bwd_inner_microstep: 5007.28 | bwd_allreduce_microstep: 58.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-29 21:08:29,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 21:08:29,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3031.48 | bwd_microstep: 4944.95 | bwd_inner_microstep: 4566.26 | bwd_allreduce_microstep: 378.63 | step_microstep: 181.52 [2024-07-29 21:08:29,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28479.59 | bwd: 40875.53 | bwd_inner: 39519.63 | bwd_allreduce: 1355.42 | step: 182.11 72%|███████▏ | 483/671 [9:25:15<3:38:55, 69.87s/it] {'loss': 1.1717, 'learning_rate': 3.8518524160021876e-06, 'epoch': 0.72} 72%|███████▏ | 483/671 [9:25:15<3:38:55, 69.87s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2384 [2024-07-29 21:08:38,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3096.56 | bwd_microstep: 5354.07 | bwd_inner_microstep: 4962.75 | bwd_allreduce_microstep: 391.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3550 [2024-07-29 21:08:47,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.75 | bwd_microstep: 5066.82 | bwd_inner_microstep: 4994.11 | bwd_allreduce_microstep: 72.64 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3805 [2024-07-29 21:08:55,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.47 | bwd_microstep: 5173.75 | bwd_inner_microstep: 5122.79 | bwd_allreduce_microstep: 50.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-29 21:09:04,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.19 | bwd_microstep: 5140.81 | bwd_inner_microstep: 5086.02 | bwd_allreduce_microstep: 54.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-29 21:09:12,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.38 | bwd_microstep: 4825.05 | bwd_inner_microstep: 4778.79 | bwd_allreduce_microstep: 46.20 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3704 [2024-07-29 21:09:21,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.37 | bwd_microstep: 5050.53 | bwd_inner_microstep: 4978.50 | bwd_allreduce_microstep: 71.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3680 [2024-07-29 21:09:29,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.15 | bwd_microstep: 4892.00 | bwd_inner_microstep: 4866.46 | bwd_allreduce_microstep: 25.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3653 [2024-07-29 21:09:38,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 21:09:38,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.17 | bwd_microstep: 5067.82 | bwd_inner_microstep: 4997.74 | bwd_allreduce_microstep: 70.01 | step_microstep: 180.90 [2024-07-29 21:09:38,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27952.95 | bwd: 40570.84 | bwd_inner: 39787.08 | bwd_allreduce: 783.29 | step: 181.58 72%|███████▏ | 484/671 [9:26:24<3:36:48, 69.56s/it] {'loss': 1.1431, 'learning_rate': 3.813806147807645e-06, 'epoch': 0.72} 72%|███████▏ | 484/671 [9:26:24<3:36:48, 69.56s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-29 21:09:47,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.79 | bwd_microstep: 5201.63 | bwd_inner_microstep: 5128.67 | bwd_allreduce_microstep: 72.90 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2273 [2024-07-29 21:09:56,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.70 | bwd_microstep: 5224.78 | bwd_inner_microstep: 4819.65 | bwd_allreduce_microstep: 405.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-29 21:10:05,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.33 | bwd_microstep: 5160.95 | bwd_inner_microstep: 4759.25 | bwd_allreduce_microstep: 401.64 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3753 [2024-07-29 21:10:13,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3109.63 | bwd_microstep: 4965.08 | bwd_inner_microstep: 4926.62 | bwd_allreduce_microstep: 38.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-29 21:10:21,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3699.95 | bwd_microstep: 4971.10 | bwd_inner_microstep: 4938.13 | bwd_allreduce_microstep: 32.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 21:10:30,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.16 | bwd_microstep: 4986.78 | bwd_inner_microstep: 4967.36 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-29 21:10:39,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.63 | bwd_microstep: 4931.65 | bwd_inner_microstep: 4900.13 | bwd_allreduce_microstep: 31.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3685 [2024-07-29 21:10:48,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.79 [2024-07-29 21:10:48,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.60 | bwd_microstep: 4957.82 | bwd_inner_microstep: 4928.09 | bwd_allreduce_microstep: 29.67 | step_microstep: 504.23 [2024-07-29 21:10:48,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28671.68 | bwd: 40399.78 | bwd_inner: 39367.84 | bwd_allreduce: 1031.47 | step: 504.79 72%|███████▏ | 485/671 [9:27:34<3:35:47, 69.61s/it] {'loss': 1.1427, 'learning_rate': 3.775904388884615e-06, 'epoch': 0.72} 72%|███████▏ | 485/671 [9:27:34<3:35:47, 69.61s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-29 21:10:57,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.84 | bwd_microstep: 5114.40 | bwd_inner_microstep: 5052.83 | bwd_allreduce_microstep: 61.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3865 [2024-07-29 21:11:06,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.76 | bwd_microstep: 5317.91 | bwd_inner_microstep: 5256.83 | bwd_allreduce_microstep: 61.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3592 [2024-07-29 21:11:15,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.83 | bwd_microstep: 5173.33 | bwd_inner_microstep: 5096.36 | bwd_allreduce_microstep: 76.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-29 21:11:23,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.56 | bwd_microstep: 5101.02 | bwd_inner_microstep: 5030.91 | bwd_allreduce_microstep: 70.05 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2100 [2024-07-29 21:11:32,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.24 | bwd_microstep: 5230.86 | bwd_inner_microstep: 4824.87 | bwd_allreduce_microstep: 405.92 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-29 21:11:41,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.41 | bwd_microstep: 4921.53 | bwd_inner_microstep: 4898.22 | bwd_allreduce_microstep: 23.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-29 21:11:49,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.71 | bwd_microstep: 5165.56 | bwd_inner_microstep: 5089.29 | bwd_allreduce_microstep: 76.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-29 21:11:59,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 21:11:59,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.39 | bwd_microstep: 5254.77 | bwd_inner_microstep: 5112.37 | bwd_allreduce_microstep: 142.33 | step_microstep: 181.91 [2024-07-29 21:11:59,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28994.65 | bwd: 41279.35 | bwd_inner: 40361.62 | bwd_allreduce: 917.26 | step: 182.48 72%|███████▏ | 486/671 [9:28:45<3:35:33, 69.91s/it] {'loss': 1.1423, 'learning_rate': 3.7381480246168665e-06, 'epoch': 0.72} 72%|███████▏ | 486/671 [9:28:45<3:35:33, 69.91s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3964 [2024-07-29 21:12:08,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3814.64 | bwd_microstep: 5184.42 | bwd_inner_microstep: 5165.31 | bwd_allreduce_microstep: 19.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3898 [2024-07-29 21:12:16,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.75 | bwd_microstep: 5206.13 | bwd_inner_microstep: 5159.10 | bwd_allreduce_microstep: 46.96 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-29 21:12:25,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.30 | bwd_microstep: 5156.33 | bwd_inner_microstep: 5081.52 | bwd_allreduce_microstep: 74.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-29 21:12:33,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3203.69 | bwd_microstep: 4832.37 | bwd_inner_microstep: 4789.58 | bwd_allreduce_microstep: 42.72 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2179 [2024-07-29 21:12:42,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.13 | bwd_microstep: 5109.12 | bwd_inner_microstep: 4709.35 | bwd_allreduce_microstep: 399.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-29 21:12:51,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.35 | bwd_microstep: 5058.85 | bwd_inner_microstep: 4997.72 | bwd_allreduce_microstep: 61.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-29 21:12:59,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.86 | bwd_microstep: 5005.15 | bwd_inner_microstep: 4953.83 | bwd_allreduce_microstep: 51.26 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-29 21:13:08,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 21:13:08,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.32 | bwd_microstep: 5064.63 | bwd_inner_microstep: 5007.41 | bwd_allreduce_microstep: 57.16 | step_microstep: 181.28 [2024-07-29 21:13:08,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28617.95 | bwd: 40616.97 | bwd_inner: 39863.75 | bwd_allreduce: 752.74 | step: 181.97 73%|███████▎ | 487/671 [9:29:54<3:34:04, 69.81s/it] {'loss': 1.1204, 'learning_rate': 3.700537936991733e-06, 'epoch': 0.72} 73%|███████▎ | 487/671 [9:29:54<3:34:04, 69.81s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2404 [2024-07-29 21:13:17,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.33 | bwd_microstep: 5474.27 | bwd_inner_microstep: 5061.57 | bwd_allreduce_microstep: 412.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3882 [2024-07-29 21:13:26,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3799.78 | bwd_microstep: 5124.90 | bwd_inner_microstep: 5105.57 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3622 [2024-07-29 21:13:35,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.15 | bwd_microstep: 5103.93 | bwd_inner_microstep: 5016.09 | bwd_allreduce_microstep: 87.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-29 21:13:44,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.15 | bwd_microstep: 4983.91 | bwd_inner_microstep: 4964.58 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3741 [2024-07-29 21:13:52,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.36 | bwd_microstep: 5125.45 | bwd_inner_microstep: 5048.36 | bwd_allreduce_microstep: 77.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-29 21:14:00,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3207.64 | bwd_microstep: 4787.95 | bwd_inner_microstep: 4768.62 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2126 [2024-07-29 21:14:09,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.73 | bwd_microstep: 5127.70 | bwd_inner_microstep: 4729.50 | bwd_allreduce_microstep: 398.13 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-29 21:14:18,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 21:14:18,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.60 | bwd_microstep: 5044.30 | bwd_inner_microstep: 4651.48 | bwd_allreduce_microstep: 392.75 | step_microstep: 180.59 [2024-07-29 21:14:18,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28572.63 | bwd: 40772.38 | bwd_inner: 39345.72 | bwd_allreduce: 1426.18 | step: 181.17 73%|███████▎ | 488/671 [9:31:04<3:32:47, 69.77s/it] {'loss': 1.1612, 'learning_rate': 3.6630750045795506e-06, 'epoch': 0.73} 73%|███████▎ | 488/671 [9:31:04<3:32:47, 69.77s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3822 [2024-07-29 21:14:26,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.08 | bwd_microstep: 4954.74 | bwd_inner_microstep: 4930.12 | bwd_allreduce_microstep: 24.56 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2256 [2024-07-29 21:14:35,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.81 | bwd_microstep: 5133.78 | bwd_inner_microstep: 4733.76 | bwd_allreduce_microstep: 399.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3766 [2024-07-29 21:14:44,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.75 | bwd_microstep: 4993.95 | bwd_inner_microstep: 4974.59 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2084 [2024-07-29 21:14:52,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.92 | bwd_microstep: 5176.19 | bwd_inner_microstep: 4773.85 | bwd_allreduce_microstep: 402.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-29 21:15:01,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.85 | bwd_microstep: 5105.21 | bwd_inner_microstep: 5035.35 | bwd_allreduce_microstep: 69.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-29 21:15:10,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3644.78 | bwd_microstep: 5166.17 | bwd_inner_microstep: 4765.46 | bwd_allreduce_microstep: 400.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3730 [2024-07-29 21:15:18,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3504.75 | bwd_microstep: 4963.55 | bwd_inner_microstep: 4915.11 | bwd_allreduce_microstep: 48.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3685 [2024-07-29 21:15:28,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 21:15:28,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.66 | bwd_microstep: 5070.65 | bwd_inner_microstep: 4998.25 | bwd_allreduce_microstep: 72.34 | step_microstep: 534.87 [2024-07-29 21:15:28,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28476.51 | bwd: 40564.22 | bwd_inner: 39126.44 | bwd_allreduce: 1437.31 | step: 535.43 73%|███████▎ | 489/671 [9:32:14<3:31:34, 69.75s/it] {'loss': 1.1158, 'learning_rate': 3.625760102513103e-06, 'epoch': 0.73} 73%|███████▎ | 489/671 [9:32:14<3:31:34, 69.75s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3833 [2024-07-29 21:15:37,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3790.62 | bwd_microstep: 5123.00 | bwd_inner_microstep: 5095.35 | bwd_allreduce_microstep: 27.58 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3828 [2024-07-29 21:15:45,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3778.94 | bwd_microstep: 5152.85 | bwd_inner_microstep: 5122.63 | bwd_allreduce_microstep: 30.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2289 [2024-07-29 21:15:54,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.51 | bwd_microstep: 5199.35 | bwd_inner_microstep: 4793.33 | bwd_allreduce_microstep: 405.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-29 21:16:03,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3771.58 | bwd_microstep: 5070.69 | bwd_inner_microstep: 5042.77 | bwd_allreduce_microstep: 27.85 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3699 [2024-07-29 21:16:12,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.99 | bwd_microstep: 5109.50 | bwd_inner_microstep: 5022.28 | bwd_allreduce_microstep: 87.16 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-29 21:16:21,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.46 | bwd_microstep: 5167.50 | bwd_inner_microstep: 5111.48 | bwd_allreduce_microstep: 55.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-29 21:16:29,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.14 | bwd_microstep: 4891.55 | bwd_inner_microstep: 4872.20 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-29 21:16:38,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-29 21:16:38,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.98 | bwd_microstep: 5006.52 | bwd_inner_microstep: 4987.13 | bwd_allreduce_microstep: 19.30 | step_microstep: 182.06 [2024-07-29 21:16:38,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29466.12 | bwd: 40720.94 | bwd_inner: 40047.13 | bwd_allreduce: 673.33 | step: 182.65 73%|███████▎ | 490/671 [9:33:24<3:31:06, 69.98s/it] {'loss': 1.1642, 'learning_rate': 3.5885941024672e-06, 'epoch': 0.73} 73%|███████▎ | 490/671 [9:33:24<3:31:06, 69.98s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2314 [2024-07-29 21:16:47,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.03 | bwd_microstep: 5238.74 | bwd_inner_microstep: 4835.13 | bwd_allreduce_microstep: 403.55 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2306 [2024-07-29 21:16:56,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.63 | bwd_microstep: 5212.41 | bwd_inner_microstep: 4808.23 | bwd_allreduce_microstep: 404.11 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3741 [2024-07-29 21:17:04,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.55 | bwd_microstep: 4798.77 | bwd_inner_microstep: 4779.39 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3765 [2024-07-29 21:17:12,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.76 | bwd_microstep: 5176.62 | bwd_inner_microstep: 5119.62 | bwd_allreduce_microstep: 56.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-29 21:17:21,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3235.96 | bwd_microstep: 4837.29 | bwd_inner_microstep: 4793.01 | bwd_allreduce_microstep: 44.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-29 21:17:29,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3850.49 | bwd_microstep: 4905.32 | bwd_inner_microstep: 4881.10 | bwd_allreduce_microstep: 24.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-29 21:17:38,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.97 | bwd_microstep: 5172.87 | bwd_inner_microstep: 5099.34 | bwd_allreduce_microstep: 73.46 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-29 21:17:47,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 21:17:47,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3013.36 | bwd_microstep: 4937.16 | bwd_inner_microstep: 4557.14 | bwd_allreduce_microstep: 379.96 | step_microstep: 434.00 [2024-07-29 21:17:47,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27604.66 | bwd: 40279.16 | bwd_inner: 38872.89 | bwd_allreduce: 1405.80 | step: 434.69 73%|███████▎ | 491/671 [9:34:33<3:28:35, 69.53s/it] {'loss': 1.146, 'learning_rate': 3.5515778726382933e-06, 'epoch': 0.73} 73%|███████▎ | 491/671 [9:34:33<3:28:35, 69.53s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3946 [2024-07-29 21:17:56,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.12 | bwd_microstep: 5315.08 | bwd_inner_microstep: 5260.06 | bwd_allreduce_microstep: 54.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3582 [2024-07-29 21:18:04,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.53 | bwd_microstep: 4888.32 | bwd_inner_microstep: 4829.78 | bwd_allreduce_microstep: 58.47 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-29 21:18:12,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.67 | bwd_microstep: 5199.69 | bwd_inner_microstep: 4795.41 | bwd_allreduce_microstep: 404.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3727 [2024-07-29 21:18:21,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.03 | bwd_microstep: 5036.16 | bwd_inner_microstep: 5016.81 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-29 21:18:30,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.30 | bwd_microstep: 5058.51 | bwd_inner_microstep: 4665.15 | bwd_allreduce_microstep: 393.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3709 [2024-07-29 21:18:38,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.98 | bwd_microstep: 5030.59 | bwd_inner_microstep: 4957.70 | bwd_allreduce_microstep: 72.83 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2158 [2024-07-29 21:18:47,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.11 | bwd_microstep: 5110.21 | bwd_inner_microstep: 4714.44 | bwd_allreduce_microstep: 395.70 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2146 [2024-07-29 21:18:56,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 21:18:56,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.41 | bwd_microstep: 5125.45 | bwd_inner_microstep: 4726.87 | bwd_allreduce_microstep: 398.51 | step_microstep: 182.11 [2024-07-29 21:18:56,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28309.06 | bwd: 40763.98 | bwd_inner: 38966.15 | bwd_allreduce: 1797.36 | step: 182.67 73%|███████▎ | 492/671 [9:35:42<3:27:18, 69.49s/it] {'loss': 1.1832, 'learning_rate': 3.5147122777242203e-06, 'epoch': 0.73} 73%|███████▎ | 492/671 [9:35:42<3:27:18, 69.49s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3817 [2024-07-29 21:19:05,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3674.13 | bwd_microstep: 5519.41 | bwd_inner_microstep: 5433.23 | bwd_allreduce_microstep: 86.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2261 [2024-07-29 21:19:14,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.19 | bwd_microstep: 5274.59 | bwd_inner_microstep: 4866.89 | bwd_allreduce_microstep: 407.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-07-29 21:19:23,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.01 | bwd_microstep: 5025.35 | bwd_inner_microstep: 5006.03 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2108 [2024-07-29 21:19:32,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.00 | bwd_microstep: 5224.81 | bwd_inner_microstep: 4817.16 | bwd_allreduce_microstep: 407.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-29 21:19:40,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.58 | bwd_microstep: 5181.93 | bwd_inner_microstep: 5101.19 | bwd_allreduce_microstep: 80.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-29 21:19:49,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.02 | bwd_microstep: 5032.45 | bwd_inner_microstep: 4990.91 | bwd_allreduce_microstep: 41.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-29 21:19:58,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.57 | bwd_microstep: 5112.44 | bwd_inner_microstep: 4717.61 | bwd_allreduce_microstep: 394.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-29 21:20:06,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-29 21:20:06,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.91 | bwd_microstep: 5074.48 | bwd_inner_microstep: 4681.33 | bwd_allreduce_microstep: 393.08 | step_microstep: 180.65 [2024-07-29 21:20:06,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28675.32 | bwd: 41445.43 | bwd_inner: 39614.30 | bwd_allreduce: 1830.65 | step: 181.21 73%|███████▎ | 493/671 [9:36:52<3:27:00, 69.78s/it] {'loss': 1.1724, 'learning_rate': 3.477998178903982e-06, 'epoch': 0.73} 73%|███████▎ | 493/671 [9:36:52<3:27:00, 69.78s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3577 [2024-07-29 21:20:15,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.17 | bwd_microstep: 5146.02 | bwd_inner_microstep: 5087.92 | bwd_allreduce_microstep: 58.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3584 [2024-07-29 21:20:23,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.38 | bwd_microstep: 5132.13 | bwd_inner_microstep: 5056.14 | bwd_allreduce_microstep: 75.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-07-29 21:20:32,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.46 | bwd_microstep: 5173.67 | bwd_inner_microstep: 5122.81 | bwd_allreduce_microstep: 50.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3748 [2024-07-29 21:20:41,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.39 | bwd_microstep: 5155.31 | bwd_inner_microstep: 5101.79 | bwd_allreduce_microstep: 53.45 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-29 21:20:50,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.29 | bwd_microstep: 5121.58 | bwd_inner_microstep: 5052.73 | bwd_allreduce_microstep: 68.78 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-07-29 21:20:59,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.67 | bwd_microstep: 5133.40 | bwd_inner_microstep: 5079.42 | bwd_allreduce_microstep: 53.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-29 21:21:07,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.78 | bwd_microstep: 5028.42 | bwd_inner_microstep: 5006.48 | bwd_allreduce_microstep: 21.88 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2143 [2024-07-29 21:21:16,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 21:21:16,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.97 | bwd_microstep: 5071.50 | bwd_inner_microstep: 4678.35 | bwd_allreduce_microstep: 393.08 | step_microstep: 181.87 [2024-07-29 21:21:16,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28431.01 | bwd: 40961.99 | bwd_inner: 40185.58 | bwd_allreduce: 775.95 | step: 182.44 74%|███████▎ | 494/671 [9:38:02<3:25:48, 69.76s/it] {'loss': 1.1447, 'learning_rate': 3.4414364338176376e-06, 'epoch': 0.74} 74%|███████▎ | 494/671 [9:38:02<3:25:48, 69.76s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-29 21:21:25,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.47 | bwd_microstep: 5362.73 | bwd_inner_microstep: 5253.71 | bwd_allreduce_microstep: 108.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3818 [2024-07-29 21:21:34,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.96 | bwd_microstep: 5141.13 | bwd_inner_microstep: 5108.97 | bwd_allreduce_microstep: 32.08 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3829 [2024-07-29 21:21:42,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3241.83 | bwd_microstep: 4853.35 | bwd_inner_microstep: 4833.94 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3759 [2024-07-29 21:21:51,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.45 | bwd_microstep: 5170.64 | bwd_inner_microstep: 5116.78 | bwd_allreduce_microstep: 53.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2218 [2024-07-29 21:22:00,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.67 | bwd_microstep: 5215.98 | bwd_inner_microstep: 4811.93 | bwd_allreduce_microstep: 403.99 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-29 21:22:08,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.19 | bwd_microstep: 4893.83 | bwd_inner_microstep: 4866.44 | bwd_allreduce_microstep: 27.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-29 21:22:17,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.54 | bwd_microstep: 5216.11 | bwd_inner_microstep: 5133.73 | bwd_allreduce_microstep: 82.32 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-29 21:22:26,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 21:22:26,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.07 | bwd_microstep: 5130.59 | bwd_inner_microstep: 4732.43 | bwd_allreduce_microstep: 398.10 | step_microstep: 180.65 [2024-07-29 21:22:26,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28603.08 | bwd: 40984.35 | bwd_inner: 39857.87 | bwd_allreduce: 1126.02 | step: 181.24 74%|███████▍ | 495/671 [9:39:12<3:24:46, 69.81s/it] {'loss': 1.1156, 'learning_rate': 3.405027896546277e-06, 'epoch': 0.74} 74%|███████▍ | 495/671 [9:39:12<3:24:46, 69.81s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2316 [2024-07-29 21:22:34,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3096.78 | bwd_microstep: 5128.57 | bwd_inner_microstep: 4736.28 | bwd_allreduce_microstep: 392.22 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2266 [2024-07-29 21:22:43,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.94 | bwd_microstep: 5276.48 | bwd_inner_microstep: 4867.10 | bwd_allreduce_microstep: 409.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-29 21:22:52,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.24 | bwd_microstep: 5100.59 | bwd_inner_microstep: 5027.46 | bwd_allreduce_microstep: 73.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-29 21:23:01,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.35 | bwd_microstep: 4992.88 | bwd_inner_microstep: 4973.44 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-29 21:23:09,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3225.75 | bwd_microstep: 4838.78 | bwd_inner_microstep: 4813.96 | bwd_allreduce_microstep: 24.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-29 21:23:17,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.72 | bwd_microstep: 5022.56 | bwd_inner_microstep: 4997.00 | bwd_allreduce_microstep: 25.49 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3726 [2024-07-29 21:23:25,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3109.96 | bwd_microstep: 4790.92 | bwd_inner_microstep: 4760.69 | bwd_allreduce_microstep: 30.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 21:23:34,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 21:23:34,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.97 | bwd_microstep: 5025.25 | bwd_inner_microstep: 4974.03 | bwd_allreduce_microstep: 51.14 | step_microstep: 180.95 [2024-07-29 21:23:34,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27611.61 | bwd: 40176.00 | bwd_inner: 39149.89 | bwd_allreduce: 1025.59 | step: 181.53 74%|███████▍ | 496/671 [9:40:20<3:22:07, 69.30s/it] {'loss': 1.1168, 'learning_rate': 3.368773417592047e-06, 'epoch': 0.74} 74%|███████▍ | 496/671 [9:40:20<3:22:07, 69.30s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3901 [2024-07-29 21:23:43,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.72 | bwd_microstep: 5061.41 | bwd_inner_microstep: 5030.41 | bwd_allreduce_microstep: 30.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3845 [2024-07-29 21:23:52,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.10 | bwd_microstep: 5159.42 | bwd_inner_microstep: 5114.85 | bwd_allreduce_microstep: 44.51 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-29 21:24:01,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.32 | bwd_microstep: 5099.09 | bwd_inner_microstep: 5040.92 | bwd_allreduce_microstep: 58.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3764 [2024-07-29 21:24:09,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.87 | bwd_microstep: 5005.05 | bwd_inner_microstep: 4985.59 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-29 21:24:18,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.60 | bwd_microstep: 5186.84 | bwd_inner_microstep: 4781.97 | bwd_allreduce_microstep: 404.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3660 [2024-07-29 21:24:27,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.53 | bwd_microstep: 5209.36 | bwd_inner_microstep: 5107.63 | bwd_allreduce_microstep: 101.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3733 [2024-07-29 21:24:36,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.76 | bwd_microstep: 5090.27 | bwd_inner_microstep: 5045.99 | bwd_allreduce_microstep: 44.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-29 21:24:44,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 21:24:44,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.21 | bwd_microstep: 5047.90 | bwd_inner_microstep: 4990.34 | bwd_allreduce_microstep: 57.49 | step_microstep: 180.74 [2024-07-29 21:24:44,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29033.01 | bwd: 40859.32 | bwd_inner: 40097.64 | bwd_allreduce: 761.21 | step: 181.30 74%|███████▍ | 497/671 [9:41:30<3:21:46, 69.58s/it] {'loss': 1.1344, 'learning_rate': 3.3326738438583116e-06, 'epoch': 0.74} 74%|███████▍ | 497/671 [9:41:30<3:21:46, 69.58s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2438 [2024-07-29 21:24:53,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.49 | bwd_microstep: 5195.70 | bwd_inner_microstep: 4794.65 | bwd_allreduce_microstep: 400.98 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2249 [2024-07-29 21:25:02,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.09 | bwd_microstep: 5276.16 | bwd_inner_microstep: 4867.49 | bwd_allreduce_microstep: 408.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2244 [2024-07-29 21:25:11,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.33 | bwd_microstep: 5173.46 | bwd_inner_microstep: 4769.69 | bwd_allreduce_microstep: 403.71 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3612 [2024-07-29 21:25:20,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.94 | bwd_microstep: 5150.37 | bwd_inner_microstep: 5055.98 | bwd_allreduce_microstep: 94.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3734 [2024-07-29 21:25:28,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.41 | bwd_microstep: 5158.29 | bwd_inner_microstep: 5103.98 | bwd_allreduce_microstep: 54.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2199 [2024-07-29 21:25:37,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.52 | bwd_microstep: 5088.70 | bwd_inner_microstep: 4695.81 | bwd_allreduce_microstep: 392.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2198 [2024-07-29 21:25:45,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3494.85 | bwd_microstep: 5089.10 | bwd_inner_microstep: 4696.82 | bwd_allreduce_microstep: 392.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-29 21:25:54,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 21:25:54,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.55 | bwd_microstep: 5052.00 | bwd_inner_microstep: 4994.57 | bwd_allreduce_microstep: 57.37 | step_microstep: 181.99 [2024-07-29 21:25:54,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28379.09 | bwd: 41183.78 | bwd_inner: 38978.94 | bwd_allreduce: 2204.38 | step: 182.55 74%|███████▍ | 498/671 [9:42:40<3:20:53, 69.67s/it] {'loss': 1.1161, 'learning_rate': 3.2967300186298456e-06, 'epoch': 0.74} 74%|███████▍ | 498/671 [9:42:40<3:20:53, 69.67s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2293 [2024-07-29 21:26:03,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.82 | bwd_microstep: 5269.37 | bwd_inner_microstep: 4865.64 | bwd_allreduce_microstep: 403.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2246 [2024-07-29 21:26:12,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.84 | bwd_microstep: 5141.84 | bwd_inner_microstep: 4744.48 | bwd_allreduce_microstep: 397.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2225 [2024-07-29 21:26:20,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.47 | bwd_microstep: 5158.49 | bwd_inner_microstep: 4758.79 | bwd_allreduce_microstep: 399.63 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3746 [2024-07-29 21:26:29,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.63 | bwd_microstep: 4996.29 | bwd_inner_microstep: 4975.83 | bwd_allreduce_microstep: 20.39 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2091 [2024-07-29 21:26:37,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3017.97 | bwd_microstep: 4922.79 | bwd_inner_microstep: 4541.63 | bwd_allreduce_microstep: 381.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-29 21:26:46,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.16 | bwd_microstep: 5125.39 | bwd_inner_microstep: 5056.27 | bwd_allreduce_microstep: 69.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-29 21:26:55,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.94 | bwd_microstep: 4960.13 | bwd_inner_microstep: 4928.46 | bwd_allreduce_microstep: 31.60 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3679 [2024-07-29 21:27:03,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 21:27:03,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3594.55 | bwd_microstep: 5091.60 | bwd_inner_microstep: 5018.71 | bwd_allreduce_microstep: 72.82 | step_microstep: 181.34 [2024-07-29 21:27:03,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28206.30 | bwd: 40665.88 | bwd_inner: 38889.75 | bwd_allreduce: 1775.66 | step: 181.90 74%|███████▍ | 499/671 [9:43:49<3:19:18, 69.53s/it] {'loss': 1.1473, 'learning_rate': 3.2609427815531448e-06, 'epoch': 0.74} 74%|███████▍ | 499/671 [9:43:49<3:19:18, 69.53s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-29 21:27:12,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.96 | bwd_microstep: 5271.73 | bwd_inner_microstep: 5207.91 | bwd_allreduce_microstep: 63.75 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2272 [2024-07-29 21:27:21,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3340.57 | bwd_microstep: 4987.59 | bwd_inner_microstep: 4602.19 | bwd_allreduce_microstep: 385.33 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3793 [2024-07-29 21:27:30,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.06 | bwd_microstep: 5187.88 | bwd_inner_microstep: 5121.49 | bwd_allreduce_microstep: 66.32 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2181 [2024-07-29 21:27:38,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.09 | bwd_microstep: 5218.71 | bwd_inner_microstep: 4813.21 | bwd_allreduce_microstep: 405.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3652 [2024-07-29 21:27:46,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2908.57 | bwd_microstep: 4524.58 | bwd_inner_microstep: 4505.23 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-29 21:27:54,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.68 | bwd_microstep: 5056.56 | bwd_inner_microstep: 5014.74 | bwd_allreduce_microstep: 41.75 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-29 21:28:03,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.46 | bwd_microstep: 5037.18 | bwd_inner_microstep: 4979.87 | bwd_allreduce_microstep: 57.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2136 [2024-07-29 21:28:12,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 21:28:12,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3490.05 | bwd_microstep: 5057.59 | bwd_inner_microstep: 4666.36 | bwd_allreduce_microstep: 391.16 | step_microstep: 180.96 [2024-07-29 21:28:12,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27710.35 | bwd: 40341.80 | bwd_inner: 38910.95 | bwd_allreduce: 1430.37 | step: 181.52 75%|███████▍ | 500/671 [9:44:58<3:17:10, 69.18s/it] {'loss': 1.0854, 'learning_rate': 3.2253129686168105e-06, 'epoch': 0.74} 75%|███████▍ | 500/671 [9:44:58<3:17:10, 69.18s/it]dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1351 [2024-07-29 21:28:21,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.62 | bwd_microstep: 5386.08 | bwd_inner_microstep: 4970.29 | bwd_allreduce_microstep: 415.73 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3582 [2024-07-29 21:28:29,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3180.12 | bwd_microstep: 4819.51 | bwd_inner_microstep: 4768.91 | bwd_allreduce_microstep: 50.54 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2064 [2024-07-29 21:28:37,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3006.97 | bwd_microstep: 4915.46 | bwd_inner_microstep: 4539.00 | bwd_allreduce_microstep: 376.40 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-29 21:28:46,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.45 | bwd_microstep: 5094.19 | bwd_inner_microstep: 5050.77 | bwd_allreduce_microstep: 43.36 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-29 21:28:54,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3053.48 | bwd_microstep: 4998.43 | bwd_inner_microstep: 4613.44 | bwd_allreduce_microstep: 384.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2197 [2024-07-29 21:29:02,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.17 | bwd_microstep: 5210.08 | bwd_inner_microstep: 4803.80 | bwd_allreduce_microstep: 406.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-29 21:29:11,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.69 | bwd_microstep: 5108.54 | bwd_inner_microstep: 4712.39 | bwd_allreduce_microstep: 396.08 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1131 [2024-07-29 21:29:20,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 21:29:20,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3455.18 | bwd_microstep: 5144.91 | bwd_inner_microstep: 4748.21 | bwd_allreduce_microstep: 396.64 | step_microstep: 180.84 [2024-07-29 21:29:20,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 26952.58 | bwd: 40677.19 | bwd_inner: 38206.74 | bwd_allreduce: 2469.98 | step: 181.54 75%|███████▍ | 501/671 [9:46:06<3:14:58, 68.81s/it] {'loss': 1.1898, 'learning_rate': 3.18984141213203e-06, 'epoch': 0.75} 75%|███████▍ | 501/671 [9:46:06<3:14:58, 68.81s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2468 [2024-07-29 21:29:29,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.41 | bwd_microstep: 5242.59 | bwd_inner_microstep: 4838.00 | bwd_allreduce_microstep: 404.52 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2279 [2024-07-29 21:29:38,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.80 | bwd_microstep: 5289.84 | bwd_inner_microstep: 4883.71 | bwd_allreduce_microstep: 406.06 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-29 21:29:46,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3707.34 | bwd_microstep: 4982.34 | bwd_inner_microstep: 4962.89 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-29 21:29:55,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.81 | bwd_microstep: 5211.07 | bwd_inner_microstep: 4805.49 | bwd_allreduce_microstep: 405.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-29 21:30:03,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3213.73 | bwd_microstep: 4782.85 | bwd_inner_microstep: 4763.35 | bwd_allreduce_microstep: 19.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3721 [2024-07-29 21:30:12,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.81 | bwd_microstep: 5005.11 | bwd_inner_microstep: 4965.55 | bwd_allreduce_microstep: 39.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-29 21:30:20,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.60 | bwd_microstep: 5124.91 | bwd_inner_microstep: 4728.20 | bwd_allreduce_microstep: 396.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-29 21:30:29,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 21:30:29,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.20 | bwd_microstep: 4983.42 | bwd_inner_microstep: 4934.25 | bwd_allreduce_microstep: 49.11 | step_microstep: 181.97 [2024-07-29 21:30:29,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28260.61 | bwd: 40622.12 | bwd_inner: 38881.38 | bwd_allreduce: 1740.26 | step: 182.56 75%|███████▍ | 502/671 [9:47:15<3:14:09, 68.93s/it] {'loss': 1.1258, 'learning_rate': 3.1545289407131128e-06, 'epoch': 0.75} 75%|███████▍ | 502/671 [9:47:15<3:14:09, 68.93s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2402 [2024-07-29 21:30:38,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.62 | bwd_microstep: 5235.16 | bwd_inner_microstep: 4833.35 | bwd_allreduce_microstep: 401.74 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 730 [2024-07-29 21:30:47,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.98 | bwd_microstep: 5333.33 | bwd_inner_microstep: 4922.73 | bwd_allreduce_microstep: 410.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3814 [2024-07-29 21:30:56,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.67 | bwd_microstep: 5170.23 | bwd_inner_microstep: 5118.03 | bwd_allreduce_microstep: 52.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2209 [2024-07-29 21:31:04,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.90 | bwd_microstep: 5181.43 | bwd_inner_microstep: 4778.29 | bwd_allreduce_microstep: 403.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-29 21:31:13,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.08 | bwd_microstep: 5218.63 | bwd_inner_microstep: 4812.06 | bwd_allreduce_microstep: 406.50 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3639 [2024-07-29 21:31:22,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.65 | bwd_microstep: 5158.01 | bwd_inner_microstep: 5060.84 | bwd_allreduce_microstep: 97.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-29 21:31:30,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.64 | bwd_microstep: 4923.45 | bwd_inner_microstep: 4899.12 | bwd_allreduce_microstep: 24.26 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2165 [2024-07-29 21:31:39,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 21:31:39,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3063.59 | bwd_microstep: 5015.80 | bwd_inner_microstep: 4627.65 | bwd_allreduce_microstep: 388.08 | step_microstep: 181.25 [2024-07-29 21:31:39,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28137.04 | bwd: 41236.01 | bwd_inner: 39052.00 | bwd_allreduce: 2183.53 | step: 181.81 75%|███████▍ | 503/671 [9:48:25<3:13:39, 69.16s/it] {'loss': 1.1519, 'learning_rate': 3.11937637925816e-06, 'epoch': 0.75} 75%|███████▍ | 503/671 [9:48:25<3:13:39, 69.16s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3600 [2024-07-29 21:31:48,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.33 | bwd_microstep: 5545.24 | bwd_inner_microstep: 5467.98 | bwd_allreduce_microstep: 77.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3821 [2024-07-29 21:31:57,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3796.42 | bwd_microstep: 5166.72 | bwd_inner_microstep: 5134.49 | bwd_allreduce_microstep: 32.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-29 21:32:06,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.36 | bwd_microstep: 5168.65 | bwd_inner_microstep: 5083.23 | bwd_allreduce_microstep: 85.36 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3724 [2024-07-29 21:32:14,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.17 | bwd_microstep: 5168.43 | bwd_inner_microstep: 5094.65 | bwd_allreduce_microstep: 73.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3641 [2024-07-29 21:32:23,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.50 | bwd_microstep: 5191.56 | bwd_inner_microstep: 5118.77 | bwd_allreduce_microstep: 72.72 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-29 21:32:32,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.15 | bwd_microstep: 5022.27 | bwd_inner_microstep: 4998.96 | bwd_allreduce_microstep: 23.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-29 21:32:41,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.24 | bwd_microstep: 4981.84 | bwd_inner_microstep: 4962.47 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 21:32:50,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 21:32:50,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.26 | bwd_microstep: 5167.31 | bwd_inner_microstep: 5092.14 | bwd_allreduce_microstep: 75.10 | step_microstep: 209.16 [2024-07-29 21:32:50,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29346.32 | bwd: 41412.00 | bwd_inner: 40952.64 | bwd_allreduce: 458.90 | step: 209.73 75%|███████▌ | 504/671 [9:49:36<3:14:08, 69.75s/it] {'loss': 1.1353, 'learning_rate': 3.0843845489297698e-06, 'epoch': 0.75} 75%|███████▌ | 504/671 [9:49:36<3:14:08, 69.75s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2400 [2024-07-29 21:32:59,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.01 | bwd_microstep: 5290.02 | bwd_inner_microstep: 4882.00 | bwd_allreduce_microstep: 407.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3798 [2024-07-29 21:33:08,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.45 | bwd_microstep: 5193.57 | bwd_inner_microstep: 5138.85 | bwd_allreduce_microstep: 54.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-29 21:33:16,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.35 | bwd_microstep: 5086.57 | bwd_inner_microstep: 5020.31 | bwd_allreduce_microstep: 66.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-29 21:33:25,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.52 | bwd_microstep: 5201.71 | bwd_inner_microstep: 5142.91 | bwd_allreduce_microstep: 58.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-29 21:33:34,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.95 | bwd_microstep: 5158.15 | bwd_inner_microstep: 5083.84 | bwd_allreduce_microstep: 74.25 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3701 [2024-07-29 21:33:42,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3202.87 | bwd_microstep: 4691.45 | bwd_inner_microstep: 4672.10 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3639 [2024-07-29 21:33:50,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.17 | bwd_microstep: 5009.05 | bwd_inner_microstep: 4937.59 | bwd_allreduce_microstep: 71.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-29 21:33:59,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 21:33:59,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.28 | bwd_microstep: 5102.22 | bwd_inner_microstep: 5034.75 | bwd_allreduce_microstep: 67.40 | step_microstep: 181.05 [2024-07-29 21:33:59,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28330.50 | bwd: 40732.71 | bwd_inner: 39912.29 | bwd_allreduce: 819.95 | step: 181.62 75%|███████▌ | 505/671 [9:50:45<3:12:40, 69.64s/it] {'loss': 1.1341, 'learning_rate': 3.0495542671358715e-06, 'epoch': 0.75} 75%|███████▌ | 505/671 [9:50:45<3:12:40, 69.64s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-29 21:34:08,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3718.79 | bwd_microstep: 5462.12 | bwd_inner_microstep: 5343.66 | bwd_allreduce_microstep: 118.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3583 [2024-07-29 21:34:17,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.62 | bwd_microstep: 5114.68 | bwd_inner_microstep: 5036.92 | bwd_allreduce_microstep: 77.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-29 21:34:26,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.77 | bwd_microstep: 5052.92 | bwd_inner_microstep: 5013.43 | bwd_allreduce_microstep: 39.43 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-29 21:34:35,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.22 | bwd_microstep: 5010.13 | bwd_inner_microstep: 4990.80 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2186 [2024-07-29 21:34:43,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3035.95 | bwd_microstep: 4968.08 | bwd_inner_microstep: 4586.31 | bwd_allreduce_microstep: 381.70 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-29 21:34:51,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.82 | bwd_microstep: 4918.47 | bwd_inner_microstep: 4895.25 | bwd_allreduce_microstep: 23.15 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1197 [2024-07-29 21:35:00,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.58 | bwd_microstep: 5146.58 | bwd_inner_microstep: 4746.74 | bwd_allreduce_microstep: 399.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-29 21:35:09,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 21:35:09,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.05 | bwd_microstep: 4985.51 | bwd_inner_microstep: 4930.44 | bwd_allreduce_microstep: 55.00 | step_microstep: 180.93 [2024-07-29 21:35:09,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28371.70 | bwd: 40658.48 | bwd_inner: 39543.49 | bwd_allreduce: 1114.51 | step: 181.53 75%|███████▌ | 506/671 [9:51:55<3:11:17, 69.56s/it] {'loss': 1.1278, 'learning_rate': 3.0148863475106315e-06, 'epoch': 0.75} 75%|███████▌ | 506/671 [9:51:55<3:11:17, 69.56s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-07-29 21:35:17,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3092.27 | bwd_microstep: 5100.16 | bwd_inner_microstep: 4712.07 | bwd_allreduce_microstep: 388.02 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3782 [2024-07-29 21:35:26,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.51 | bwd_microstep: 5215.50 | bwd_inner_microstep: 5157.86 | bwd_allreduce_microstep: 57.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3792 [2024-07-29 21:35:35,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.15 | bwd_microstep: 5215.98 | bwd_inner_microstep: 5162.87 | bwd_allreduce_microstep: 53.04 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3615 [2024-07-29 21:35:43,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.12 | bwd_microstep: 5177.49 | bwd_inner_microstep: 5099.62 | bwd_allreduce_microstep: 77.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-29 21:35:52,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.78 | bwd_microstep: 5033.40 | bwd_inner_microstep: 4994.56 | bwd_allreduce_microstep: 38.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2124 [2024-07-29 21:36:01,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3469.81 | bwd_microstep: 5048.29 | bwd_inner_microstep: 4654.83 | bwd_allreduce_microstep: 393.39 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-29 21:36:09,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.35 | bwd_microstep: 5052.80 | bwd_inner_microstep: 4993.47 | bwd_allreduce_microstep: 59.26 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2140 [2024-07-29 21:36:18,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 21:36:18,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3493.62 | bwd_microstep: 5088.51 | bwd_inner_microstep: 4694.41 | bwd_allreduce_microstep: 394.03 | step_microstep: 180.64 [2024-07-29 21:36:18,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28068.50 | bwd: 40932.11 | bwd_inner: 39469.64 | bwd_allreduce: 1462.00 | step: 181.41 76%|███████▌ | 507/671 [9:53:04<3:09:56, 69.49s/it] {'loss': 1.1183, 'learning_rate': 2.98038159989543e-06, 'epoch': 0.75} 76%|███████▌ | 507/671 [9:53:04<3:09:56, 69.49s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3915 [2024-07-29 21:36:27,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.93 | bwd_microstep: 5360.00 | bwd_inner_microstep: 5295.12 | bwd_allreduce_microstep: 64.81 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3822 [2024-07-29 21:36:36,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3795.96 | bwd_microstep: 5181.19 | bwd_inner_microstep: 5146.92 | bwd_allreduce_microstep: 34.21 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3763 [2024-07-29 21:36:45,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.45 | bwd_microstep: 5136.69 | bwd_inner_microstep: 5068.93 | bwd_allreduce_microstep: 67.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3748 [2024-07-29 21:36:53,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3102.68 | bwd_microstep: 4952.65 | bwd_inner_microstep: 4911.76 | bwd_allreduce_microstep: 40.82 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2192 [2024-07-29 21:37:02,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.11 | bwd_microstep: 5293.22 | bwd_inner_microstep: 4884.79 | bwd_allreduce_microstep: 408.37 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3658 [2024-07-29 21:37:10,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3090.79 | bwd_microstep: 4859.53 | bwd_inner_microstep: 4818.01 | bwd_allreduce_microstep: 41.45 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2205 [2024-07-29 21:37:18,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.09 | bwd_microstep: 5129.93 | bwd_inner_microstep: 4730.61 | bwd_allreduce_microstep: 399.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-29 21:37:27,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 21:37:27,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.73 | bwd_microstep: 5050.39 | bwd_inner_microstep: 5022.37 | bwd_allreduce_microstep: 27.95 | step_microstep: 180.93 [2024-07-29 21:37:27,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28107.66 | bwd: 40963.58 | bwd_inner: 39878.45 | bwd_allreduce: 1084.66 | step: 181.49 76%|███████▌ | 508/671 [9:54:13<3:08:42, 69.46s/it] {'loss': 1.1502, 'learning_rate': 2.9460408303199696e-06, 'epoch': 0.76} 76%|███████▌ | 508/671 [9:54:13<3:08:42, 69.46s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-29 21:37:36,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3261.07 | bwd_microstep: 4966.12 | bwd_inner_microstep: 4903.31 | bwd_allreduce_microstep: 62.75 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3602 [2024-07-29 21:37:44,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.82 | bwd_microstep: 5154.88 | bwd_inner_microstep: 5070.67 | bwd_allreduce_microstep: 84.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3793 [2024-07-29 21:37:52,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3263.07 | bwd_microstep: 4835.41 | bwd_inner_microstep: 4816.05 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-29 21:38:01,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.89 | bwd_microstep: 5230.37 | bwd_inner_microstep: 4823.94 | bwd_allreduce_microstep: 406.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-07-29 21:38:10,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.80 | bwd_microstep: 5092.81 | bwd_inner_microstep: 4694.82 | bwd_allreduce_microstep: 397.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-29 21:38:19,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.79 | bwd_microstep: 5181.16 | bwd_inner_microstep: 5108.07 | bwd_allreduce_microstep: 73.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-29 21:38:27,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.06 | bwd_microstep: 4954.41 | bwd_inner_microstep: 4922.68 | bwd_allreduce_microstep: 31.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-29 21:38:36,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-29 21:38:36,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3713.62 | bwd_microstep: 4895.42 | bwd_inner_microstep: 4876.08 | bwd_allreduce_microstep: 19.27 | step_microstep: 181.06 [2024-07-29 21:38:36,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28093.02 | bwd: 40310.56 | bwd_inner: 39215.57 | bwd_allreduce: 1094.52 | step: 181.64 76%|███████▌ | 509/671 [9:55:22<3:06:57, 69.24s/it] {'loss': 1.1151, 'learning_rate': 2.9118648409834205e-06, 'epoch': 0.76} 76%|███████▌ | 509/671 [9:55:22<3:06:57, 69.24s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2378 [2024-07-29 21:38:45,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.93 | bwd_microstep: 5507.62 | bwd_inner_microstep: 5085.45 | bwd_allreduce_microstep: 422.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2236 [2024-07-29 21:38:54,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.34 | bwd_microstep: 5232.56 | bwd_inner_microstep: 4826.19 | bwd_allreduce_microstep: 406.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-29 21:39:03,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.92 | bwd_microstep: 5052.25 | bwd_inner_microstep: 5024.52 | bwd_allreduce_microstep: 27.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-29 21:39:12,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.46 | bwd_microstep: 5191.99 | bwd_inner_microstep: 5138.62 | bwd_allreduce_microstep: 53.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-29 21:39:20,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.59 | bwd_microstep: 5228.22 | bwd_inner_microstep: 4820.75 | bwd_allreduce_microstep: 407.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3711 [2024-07-29 21:39:29,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.11 | bwd_microstep: 4902.14 | bwd_inner_microstep: 4878.66 | bwd_allreduce_microstep: 23.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-29 21:39:38,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.18 | bwd_microstep: 4910.54 | bwd_inner_microstep: 4891.15 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-29 21:39:46,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 21:39:46,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.73 | bwd_microstep: 4997.88 | bwd_inner_microstep: 4947.34 | bwd_allreduce_microstep: 50.47 | step_microstep: 180.70 [2024-07-29 21:39:46,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29014.18 | bwd: 41023.18 | bwd_inner: 39612.63 | bwd_allreduce: 1410.09 | step: 181.27 76%|███████▌ | 510/671 [9:56:32<3:06:42, 69.58s/it] {'loss': 1.0964, 'learning_rate': 2.8778544302356938e-06, 'epoch': 0.76} 76%|███████▌ | 510/671 [9:56:32<3:06:42, 69.58s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3612 [2024-07-29 21:39:55,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3126.63 | bwd_microstep: 5056.42 | bwd_inner_microstep: 4980.08 | bwd_allreduce_microstep: 76.27 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2251 [2024-07-29 21:40:03,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.44 | bwd_microstep: 5169.57 | bwd_inner_microstep: 4766.49 | bwd_allreduce_microstep: 403.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3576 [2024-07-29 21:40:12,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.68 | bwd_microstep: 5125.76 | bwd_inner_microstep: 5047.19 | bwd_allreduce_microstep: 78.50 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3096 [2024-07-29 21:40:21,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.64 | bwd_microstep: 5171.65 | bwd_inner_microstep: 4870.12 | bwd_allreduce_microstep: 301.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-29 21:40:30,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.57 | bwd_microstep: 5220.33 | bwd_inner_microstep: 4814.22 | bwd_allreduce_microstep: 406.05 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3702 [2024-07-29 21:40:38,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.91 | bwd_microstep: 4947.53 | bwd_inner_microstep: 4920.68 | bwd_allreduce_microstep: 26.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-29 21:40:47,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.42 | bwd_microstep: 4985.64 | bwd_inner_microstep: 4936.91 | bwd_allreduce_microstep: 48.66 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-29 21:40:56,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 21:40:56,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.95 | bwd_microstep: 5000.55 | bwd_inner_microstep: 4948.81 | bwd_allreduce_microstep: 51.67 | step_microstep: 180.75 [2024-07-29 21:40:56,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28142.13 | bwd: 40677.43 | bwd_inner: 39284.44 | bwd_allreduce: 1392.52 | step: 181.33 76%|███████▌ | 511/671 [9:57:42<3:05:11, 69.45s/it] {'loss': 1.1387, 'learning_rate': 2.8440103925587904e-06, 'epoch': 0.76} 76%|███████▌ | 511/671 [9:57:42<3:05:11, 69.45s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3980 [2024-07-29 21:41:05,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.57 | bwd_microstep: 5285.31 | bwd_inner_microstep: 5243.35 | bwd_allreduce_microstep: 41.90 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3584 [2024-07-29 21:41:13,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.58 | bwd_microstep: 5113.76 | bwd_inner_microstep: 5033.96 | bwd_allreduce_microstep: 79.74 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2054 [2024-07-29 21:41:22,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.66 | bwd_microstep: 5207.44 | bwd_inner_microstep: 4801.59 | bwd_allreduce_microstep: 405.78 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-29 21:41:31,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.11 | bwd_microstep: 5155.37 | bwd_inner_microstep: 4754.85 | bwd_allreduce_microstep: 400.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-29 21:41:40,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.45 | bwd_microstep: 5218.19 | bwd_inner_microstep: 4811.40 | bwd_allreduce_microstep: 406.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-29 21:41:48,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.32 | bwd_microstep: 5241.95 | bwd_inner_microstep: 4835.85 | bwd_allreduce_microstep: 406.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-29 21:41:57,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.19 | bwd_microstep: 5123.48 | bwd_inner_microstep: 5078.68 | bwd_allreduce_microstep: 44.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-29 21:42:06,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 21:42:06,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.32 | bwd_microstep: 4901.24 | bwd_inner_microstep: 4880.74 | bwd_allreduce_microstep: 20.43 | step_microstep: 180.94 [2024-07-29 21:42:06,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28877.10 | bwd: 41246.73 | bwd_inner: 39440.35 | bwd_allreduce: 1805.90 | step: 181.61 76%|███████▋ | 512/671 [9:58:52<3:04:50, 69.75s/it] {'loss': 1.0998, 'learning_rate': 2.810333518548246e-06, 'epoch': 0.76} 76%|███████▋ | 512/671 [9:58:52<3:04:50, 69.75s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3839 [2024-07-29 21:42:14,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3291.49 | bwd_microstep: 4997.69 | bwd_inner_microstep: 4961.83 | bwd_allreduce_microstep: 35.80 | step_microstep: 0.10 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1208 [2024-07-29 21:42:23,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.63 | bwd_microstep: 5231.04 | bwd_inner_microstep: 4826.15 | bwd_allreduce_microstep: 404.83 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-07-29 21:42:32,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.63 | bwd_microstep: 5180.33 | bwd_inner_microstep: 4777.29 | bwd_allreduce_microstep: 402.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2186 [2024-07-29 21:42:40,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3471.37 | bwd_microstep: 5092.63 | bwd_inner_microstep: 4696.81 | bwd_allreduce_microstep: 395.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3701 [2024-07-29 21:42:49,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.93 | bwd_microstep: 5044.16 | bwd_inner_microstep: 5003.47 | bwd_allreduce_microstep: 40.61 | step_microstep: 0.09 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1074 [2024-07-29 21:42:58,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.53 | bwd_microstep: 5265.48 | bwd_inner_microstep: 4860.47 | bwd_allreduce_microstep: 404.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-29 21:43:06,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3182.25 | bwd_microstep: 4699.14 | bwd_inner_microstep: 4677.03 | bwd_allreduce_microstep: 22.04 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2137 [2024-07-29 21:43:15,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.43 [2024-07-29 21:43:15,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.92 | bwd_microstep: 5112.64 | bwd_inner_microstep: 4715.43 | bwd_allreduce_microstep: 397.13 | step_microstep: 180.97 [2024-07-29 21:43:15,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27704.65 | bwd: 40623.08 | bwd_inner: 38518.42 | bwd_allreduce: 2104.20 | step: 181.56 76%|███████▋ | 513/671 [10:00:01<3:02:48, 69.42s/it] {'loss': 1.1495, 'learning_rate': 2.7768245948946615e-06, 'epoch': 0.76} 76%|███████▋ | 513/671 [10:00:01<3:02:48, 69.42s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3837 [2024-07-29 21:43:24,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.63 | bwd_microstep: 5327.36 | bwd_inner_microstep: 5263.59 | bwd_allreduce_microstep: 63.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3773 [2024-07-29 21:43:33,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.90 | bwd_microstep: 5075.26 | bwd_inner_microstep: 5045.87 | bwd_allreduce_microstep: 29.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3818 [2024-07-29 21:43:41,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.14 | bwd_microstep: 5205.66 | bwd_inner_microstep: 5153.54 | bwd_allreduce_microstep: 52.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3775 [2024-07-29 21:43:50,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.67 | bwd_microstep: 5123.08 | bwd_inner_microstep: 5076.67 | bwd_allreduce_microstep: 46.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-07-29 21:43:59,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.73 | bwd_microstep: 4998.03 | bwd_inner_microstep: 4978.58 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-29 21:44:08,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.82 | bwd_microstep: 4931.90 | bwd_inner_microstep: 4906.69 | bwd_allreduce_microstep: 25.14 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2178 [2024-07-29 21:44:16,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.96 | bwd_microstep: 5156.39 | bwd_inner_microstep: 4754.71 | bwd_allreduce_microstep: 401.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-29 21:44:25,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.46 [2024-07-29 21:44:25,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3241.34 | bwd_microstep: 4829.31 | bwd_inner_microstep: 4791.70 | bwd_allreduce_microstep: 37.54 | step_microstep: 181.13 [2024-07-29 21:44:25,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28874.10 | bwd: 40646.96 | bwd_inner: 39971.30 | bwd_allreduce: 675.19 | step: 181.69 77%|███████▋ | 514/671 [10:01:10<3:01:59, 69.55s/it] {'loss': 1.153, 'learning_rate': 2.743484404365314e-06, 'epoch': 0.77} 77%|███████▋ | 514/671 [10:01:10<3:01:59, 69.55s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 4042 [2024-07-29 21:44:33,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.81 | bwd_microstep: 5226.84 | bwd_inner_microstep: 5186.51 | bwd_allreduce_microstep: 40.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3873 [2024-07-29 21:44:42,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3784.93 | bwd_microstep: 5116.32 | bwd_inner_microstep: 5097.03 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3820 [2024-07-29 21:44:51,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.52 | bwd_microstep: 5058.27 | bwd_inner_microstep: 5038.92 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-29 21:45:00,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.14 | bwd_microstep: 5227.45 | bwd_inner_microstep: 4822.66 | bwd_allreduce_microstep: 404.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2219 [2024-07-29 21:45:08,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3074.10 | bwd_microstep: 5087.74 | bwd_inner_microstep: 4694.99 | bwd_allreduce_microstep: 392.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-29 21:45:17,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.07 | bwd_microstep: 5037.13 | bwd_inner_microstep: 5010.58 | bwd_allreduce_microstep: 26.48 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3694 [2024-07-29 21:45:26,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.74 | bwd_microstep: 5088.06 | bwd_inner_microstep: 5005.92 | bwd_allreduce_microstep: 82.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-29 21:45:35,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 21:45:35,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.50 | bwd_microstep: 5074.69 | bwd_inner_microstep: 5013.35 | bwd_allreduce_microstep: 61.28 | step_microstep: 181.50 [2024-07-29 21:45:35,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28760.72 | bwd: 40916.49 | bwd_inner: 39869.89 | bwd_allreduce: 1046.13 | step: 182.07 77%|███████▋ | 515/671 [10:02:20<3:01:11, 69.69s/it] {'loss': 1.1609, 'learning_rate': 2.7103137257858867e-06, 'epoch': 0.77} 77%|███████▋ | 515/671 [10:02:20<3:01:11, 69.69s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3903 [2024-07-29 21:45:44,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3795.78 | bwd_microstep: 5160.47 | bwd_inner_microstep: 5141.31 | bwd_allreduce_microstep: 19.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2297 [2024-07-29 21:45:52,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.53 | bwd_microstep: 5231.99 | bwd_inner_microstep: 4827.06 | bwd_allreduce_microstep: 404.87 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-07-29 21:46:01,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.13 | bwd_microstep: 5056.19 | bwd_inner_microstep: 5029.29 | bwd_allreduce_microstep: 26.83 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2189 [2024-07-29 21:46:10,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.35 | bwd_microstep: 5211.00 | bwd_inner_microstep: 4805.70 | bwd_allreduce_microstep: 405.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3739 [2024-07-29 21:46:19,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.50 | bwd_microstep: 4982.74 | bwd_inner_microstep: 4963.29 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-29 21:46:27,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.52 | bwd_microstep: 4857.49 | bwd_inner_microstep: 4813.28 | bwd_allreduce_microstep: 44.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-29 21:46:35,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.91 | bwd_microstep: 5058.84 | bwd_inner_microstep: 5005.33 | bwd_allreduce_microstep: 53.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-29 21:46:44,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 21:46:44,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.57 | bwd_microstep: 5072.75 | bwd_inner_microstep: 5014.08 | bwd_allreduce_microstep: 58.60 | step_microstep: 180.65 [2024-07-29 21:46:44,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28802.19 | bwd: 40631.45 | bwd_inner: 39599.28 | bwd_allreduce: 1031.71 | step: 181.22 77%|███████▋ | 516/671 [10:03:30<3:00:04, 69.71s/it] {'loss': 1.1579, 'learning_rate': 2.6773133340222647e-06, 'epoch': 0.77} 77%|███████▋ | 516/671 [10:03:30<3:00:04, 69.71s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3853 [2024-07-29 21:46:53,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.67 | bwd_microstep: 5299.01 | bwd_inner_microstep: 5236.72 | bwd_allreduce_microstep: 62.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2279 [2024-07-29 21:47:02,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.10 | bwd_microstep: 5302.71 | bwd_inner_microstep: 4890.44 | bwd_allreduce_microstep: 412.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-29 21:47:11,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.40 | bwd_microstep: 5000.61 | bwd_inner_microstep: 4981.21 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3773 [2024-07-29 21:47:20,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.46 | bwd_microstep: 5165.96 | bwd_inner_microstep: 5113.91 | bwd_allreduce_microstep: 51.98 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2233 [2024-07-29 21:47:29,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.38 | bwd_microstep: 5213.28 | bwd_inner_microstep: 4809.79 | bwd_allreduce_microstep: 403.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-07-29 21:47:37,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.61 | bwd_microstep: 5006.12 | bwd_inner_microstep: 4986.77 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-29 21:47:46,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.92 | bwd_microstep: 5174.38 | bwd_inner_microstep: 5097.42 | bwd_allreduce_microstep: 76.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-29 21:47:55,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.44 [2024-07-29 21:47:55,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.09 | bwd_microstep: 5121.01 | bwd_inner_microstep: 4722.74 | bwd_allreduce_microstep: 398.21 | step_microstep: 180.65 [2024-07-29 21:47:55,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29032.53 | bwd: 41283.04 | bwd_inner: 39838.92 | bwd_allreduce: 1443.60 | step: 181.34 77%|███████▋ | 517/671 [10:04:41<2:59:38, 69.99s/it] {'loss': 1.1728, 'learning_rate': 2.6444839999624496e-06, 'epoch': 0.77} 77%|███████▋ | 517/671 [10:04:41<2:59:38, 69.99s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2364 [2024-07-29 21:48:04,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.29 | bwd_microstep: 5228.31 | bwd_inner_microstep: 4823.90 | bwd_allreduce_microstep: 404.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3848 [2024-07-29 21:48:13,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.02 | bwd_microstep: 5142.46 | bwd_inner_microstep: 5100.16 | bwd_allreduce_microstep: 42.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-29 21:48:22,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3638.83 | bwd_microstep: 5237.87 | bwd_inner_microstep: 5142.81 | bwd_allreduce_microstep: 94.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-29 21:48:30,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.56 | bwd_microstep: 4980.94 | bwd_inner_microstep: 4961.53 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-29 21:48:39,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.37 | bwd_microstep: 5088.32 | bwd_inner_microstep: 5043.03 | bwd_allreduce_microstep: 45.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-07-29 21:48:47,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3019.09 | bwd_microstep: 4862.64 | bwd_inner_microstep: 4487.50 | bwd_allreduce_microstep: 375.07 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-29 21:48:56,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.45 | bwd_microstep: 4983.81 | bwd_inner_microstep: 4964.40 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3676 [2024-07-29 21:49:04,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-29 21:49:04,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.69 | bwd_microstep: 4934.45 | bwd_inner_microstep: 4909.00 | bwd_allreduce_microstep: 25.39 | step_microstep: 180.67 [2024-07-29 21:49:04,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28713.20 | bwd: 40458.78 | bwd_inner: 39432.26 | bwd_allreduce: 1026.03 | step: 181.24 77%|███████▋ | 518/671 [10:05:50<2:58:05, 69.84s/it] {'loss': 1.1371, 'learning_rate': 2.611826490498527e-06, 'epoch': 0.77} 77%|███████▋ | 518/671 [10:05:50<2:58:05, 69.84s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3955 [2024-07-29 21:49:13,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3651.71 | bwd_microstep: 5224.51 | bwd_inner_microstep: 5160.93 | bwd_allreduce_microstep: 63.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3821 [2024-07-29 21:49:22,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.31 | bwd_microstep: 5045.82 | bwd_inner_microstep: 5026.39 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-29 21:49:31,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.93 | bwd_microstep: 5180.11 | bwd_inner_microstep: 5122.89 | bwd_allreduce_microstep: 57.15 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3787 [2024-07-29 21:49:40,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.91 | bwd_microstep: 5135.28 | bwd_inner_microstep: 5090.11 | bwd_allreduce_microstep: 45.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3767 [2024-07-29 21:49:48,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3463.00 | bwd_microstep: 4912.89 | bwd_inner_microstep: 4891.11 | bwd_allreduce_microstep: 21.72 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-29 21:49:57,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.43 | bwd_microstep: 5235.30 | bwd_inner_microstep: 4828.92 | bwd_allreduce_microstep: 406.31 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-29 21:50:06,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3483.47 | bwd_microstep: 5121.05 | bwd_inner_microstep: 4724.19 | bwd_allreduce_microstep: 396.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-29 21:50:14,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 21:50:14,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3402.01 | bwd_microstep: 4938.95 | bwd_inner_microstep: 4896.08 | bwd_allreduce_microstep: 42.80 | step_microstep: 180.63 [2024-07-29 21:50:14,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28547.68 | bwd: 40793.89 | bwd_inner: 39740.56 | bwd_allreduce: 1052.87 | step: 181.20 77%|███████▋ | 519/671 [10:07:00<2:56:47, 69.79s/it] {'loss': 1.1787, 'learning_rate': 2.5793415685087797e-06, 'epoch': 0.77} 77%|███████▋ | 519/671 [10:07:00<2:56:47, 69.79s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 21:50:23,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.15 | bwd_microstep: 5338.18 | bwd_inner_microstep: 5241.57 | bwd_allreduce_microstep: 96.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3599 [2024-07-29 21:50:32,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.05 | bwd_microstep: 5252.01 | bwd_inner_microstep: 5162.32 | bwd_allreduce_microstep: 89.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3791 [2024-07-29 21:50:41,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.89 | bwd_microstep: 5039.88 | bwd_inner_microstep: 5020.52 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3757 [2024-07-29 21:50:50,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.59 | bwd_microstep: 5006.14 | bwd_inner_microstep: 4986.77 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-29 21:50:58,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.02 | bwd_microstep: 5204.43 | bwd_inner_microstep: 4796.71 | bwd_allreduce_microstep: 407.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-29 21:51:07,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.92 | bwd_microstep: 4996.65 | bwd_inner_microstep: 4977.28 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-29 21:51:16,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.02 | bwd_microstep: 4972.80 | bwd_inner_microstep: 4923.93 | bwd_allreduce_microstep: 48.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3712 [2024-07-29 21:51:24,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 21:51:24,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.31 | bwd_microstep: 4984.43 | bwd_inner_microstep: 4936.76 | bwd_allreduce_microstep: 47.60 | step_microstep: 180.76 [2024-07-29 21:51:24,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29155.87 | bwd: 40794.49 | bwd_inner: 40045.82 | bwd_allreduce: 748.20 | step: 181.33 77%|███████▋ | 520/671 [10:08:10<2:56:00, 69.94s/it] {'loss': 1.1468, 'learning_rate': 2.5470299928398424e-06, 'epoch': 0.77} 77%|███████▋ | 520/671 [10:08:10<2:56:00, 69.94s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2343 [2024-07-29 21:51:33,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.05 | bwd_microstep: 5372.29 | bwd_inner_microstep: 4956.14 | bwd_allreduce_microstep: 416.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3792 [2024-07-29 21:51:42,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.98 | bwd_microstep: 5054.63 | bwd_inner_microstep: 5032.90 | bwd_allreduce_microstep: 21.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-29 21:51:51,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.91 | bwd_microstep: 5174.83 | bwd_inner_microstep: 5091.19 | bwd_allreduce_microstep: 83.58 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2123 [2024-07-29 21:52:00,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.87 | bwd_microstep: 5244.84 | bwd_inner_microstep: 4837.04 | bwd_allreduce_microstep: 407.74 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-29 21:52:08,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3226.84 | bwd_microstep: 4817.93 | bwd_inner_microstep: 4794.19 | bwd_allreduce_microstep: 23.67 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-29 21:52:17,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.35 | bwd_microstep: 5083.13 | bwd_inner_microstep: 5040.62 | bwd_allreduce_microstep: 42.43 | step_microstep: 0.11 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-29 21:52:25,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.00 | bwd_microstep: 5057.35 | bwd_inner_microstep: 5016.29 | bwd_allreduce_microstep: 41.00 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2135 [2024-07-29 21:52:34,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 21:52:34,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.06 | bwd_microstep: 5089.04 | bwd_inner_microstep: 4692.52 | bwd_allreduce_microstep: 396.45 | step_microstep: 181.39 [2024-07-29 21:52:34,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28653.97 | bwd: 40894.04 | bwd_inner: 39460.84 | bwd_allreduce: 1432.72 | step: 182.00 78%|███████▊ | 521/671 [10:09:20<2:54:47, 69.92s/it] {'loss': 1.1859, 'learning_rate': 2.5148925182889916e-06, 'epoch': 0.78} 78%|███████▊ | 521/671 [10:09:20<2:54:47, 69.92s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2424 [2024-07-29 21:52:43,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3124.51 | bwd_microstep: 5233.13 | bwd_inner_microstep: 4834.98 | bwd_allreduce_microstep: 398.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2310 [2024-07-29 21:52:51,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.42 | bwd_microstep: 5202.67 | bwd_inner_microstep: 4798.13 | bwd_allreduce_microstep: 404.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2293 [2024-07-29 21:53:00,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.66 | bwd_microstep: 5220.20 | bwd_inner_microstep: 4814.72 | bwd_allreduce_microstep: 405.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-29 21:53:09,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3746.13 | bwd_microstep: 4976.75 | bwd_inner_microstep: 4957.36 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-29 21:53:18,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.76 | bwd_microstep: 4990.68 | bwd_inner_microstep: 4971.32 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-07-29 21:53:26,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.76 | bwd_microstep: 5001.41 | bwd_inner_microstep: 4981.98 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-29 21:53:35,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.34 | bwd_microstep: 4993.57 | bwd_inner_microstep: 4974.26 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-29 21:53:44,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-29 21:53:44,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.53 | bwd_microstep: 4887.65 | bwd_inner_microstep: 4868.23 | bwd_allreduce_microstep: 19.34 | step_microstep: 181.45 [2024-07-29 21:53:44,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28848.01 | bwd: 40506.04 | bwd_inner: 39200.94 | bwd_allreduce: 1304.63 | step: 182.03 78%|███████▊ | 522/671 [10:10:30<2:53:27, 69.85s/it] {'loss': 1.088, 'learning_rate': 2.4829298955865022e-06, 'epoch': 0.78} 78%|███████▊ | 522/671 [10:10:30<2:53:27, 69.85s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2418 [2024-07-29 21:53:53,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.11 | bwd_microstep: 5434.55 | bwd_inner_microstep: 5016.40 | bwd_allreduce_microstep: 418.08 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2234 [2024-07-29 21:54:01,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3010.96 | bwd_microstep: 4886.70 | bwd_inner_microstep: 4511.28 | bwd_allreduce_microstep: 375.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3744 [2024-07-29 21:54:10,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.83 | bwd_microstep: 5222.64 | bwd_inner_microstep: 5159.89 | bwd_allreduce_microstep: 62.69 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-29 21:54:18,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.59 | bwd_microstep: 5171.72 | bwd_inner_microstep: 4768.36 | bwd_allreduce_microstep: 403.30 | step_microstep: 0.07 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2150 [2024-07-29 21:54:27,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.90 | bwd_microstep: 5115.16 | bwd_inner_microstep: 4716.89 | bwd_allreduce_microstep: 398.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-29 21:54:36,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.26 | bwd_microstep: 5000.87 | bwd_inner_microstep: 4945.58 | bwd_allreduce_microstep: 55.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-29 21:54:44,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3689.29 | bwd_microstep: 4890.46 | bwd_inner_microstep: 4871.10 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-29 21:54:53,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 21:54:53,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.27 | bwd_microstep: 5229.26 | bwd_inner_microstep: 4822.62 | bwd_allreduce_microstep: 406.58 | step_microstep: 181.40 [2024-07-29 21:54:53,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28105.11 | bwd: 40951.34 | bwd_inner: 38812.06 | bwd_allreduce: 2138.81 | step: 182.08 78%|███████▊ | 523/671 [10:11:39<2:51:56, 69.71s/it] {'loss': 1.1379, 'learning_rate': 2.451142871378124e-06, 'epoch': 0.78} 78%|███████▊ | 523/671 [10:11:39<2:51:56, 69.71s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2321 [2024-07-29 21:55:02,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.65 | bwd_microstep: 5192.55 | bwd_inner_microstep: 4793.38 | bwd_allreduce_microstep: 399.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2228 [2024-07-29 21:55:11,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.23 | bwd_microstep: 5180.39 | bwd_inner_microstep: 4777.99 | bwd_allreduce_microstep: 402.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3805 [2024-07-29 21:55:20,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.46 | bwd_microstep: 5028.67 | bwd_inner_microstep: 5009.38 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2187 [2024-07-29 21:55:28,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.07 | bwd_microstep: 5102.21 | bwd_inner_microstep: 4707.04 | bwd_allreduce_microstep: 395.10 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3721 [2024-07-29 21:55:37,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.98 | bwd_microstep: 5002.25 | bwd_inner_microstep: 4977.85 | bwd_allreduce_microstep: 24.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2207 [2024-07-29 21:55:46,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.65 | bwd_microstep: 5160.77 | bwd_inner_microstep: 4758.69 | bwd_allreduce_microstep: 402.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3697 [2024-07-29 21:55:54,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.55 | bwd_microstep: 5050.63 | bwd_inner_microstep: 5009.72 | bwd_allreduce_microstep: 40.84 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-29 21:56:03,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.86 [2024-07-29 21:56:03,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.54 | bwd_microstep: 4897.36 | bwd_inner_microstep: 4878.00 | bwd_allreduce_microstep: 19.30 | step_microstep: 181.50 [2024-07-29 21:56:03,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29009.04 | bwd: 40614.81 | bwd_inner: 38911.99 | bwd_allreduce: 1702.35 | step: 182.06 78%|███████▊ | 524/671 [10:12:49<2:50:57, 69.78s/it] {'loss': 1.1373, 'learning_rate': 2.4195321882076295e-06, 'epoch': 0.78} 78%|███████▊ | 524/671 [10:12:49<2:50:57, 69.78s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3174 [2024-07-29 21:56:12,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.46 | bwd_microstep: 5296.80 | bwd_inner_microstep: 4987.25 | bwd_allreduce_microstep: 309.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3894 [2024-07-29 21:56:20,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3284.93 | bwd_microstep: 4923.96 | bwd_inner_microstep: 4904.66 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-07-29 21:56:29,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.25 | bwd_microstep: 5006.84 | bwd_inner_microstep: 4987.60 | bwd_allreduce_microstep: 19.17 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2092 [2024-07-29 21:56:38,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.60 | bwd_microstep: 5245.46 | bwd_inner_microstep: 4837.64 | bwd_allreduce_microstep: 407.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-29 21:56:47,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.35 | bwd_microstep: 5202.49 | bwd_inner_microstep: 5123.37 | bwd_allreduce_microstep: 79.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-07-29 21:56:56,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.41 | bwd_microstep: 5172.38 | bwd_inner_microstep: 4769.01 | bwd_allreduce_microstep: 403.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-29 21:57:04,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.60 | bwd_microstep: 5206.95 | bwd_inner_microstep: 5151.49 | bwd_allreduce_microstep: 55.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-29 21:57:13,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-29 21:57:13,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.51 | bwd_microstep: 4959.83 | bwd_inner_microstep: 4931.86 | bwd_allreduce_microstep: 27.91 | step_microstep: 180.86 [2024-07-29 21:57:13,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28684.01 | bwd: 41014.69 | bwd_inner: 39692.82 | bwd_allreduce: 1321.40 | step: 181.42 78%|███████▊ | 525/671 [10:13:59<2:49:58, 69.85s/it] {'loss': 1.2, 'learning_rate': 2.3880985844994674e-06, 'epoch': 0.78} 78%|███████▊ | 525/671 [10:13:59<2:49:58, 69.85s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3871 [2024-07-29 21:57:22,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3310.88 | bwd_microstep: 5005.93 | bwd_inner_microstep: 4975.00 | bwd_allreduce_microstep: 30.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2250 [2024-07-29 21:57:30,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3060.67 | bwd_microstep: 5039.46 | bwd_inner_microstep: 4653.60 | bwd_allreduce_microstep: 385.80 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3830 [2024-07-29 21:57:39,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.28 | bwd_microstep: 5044.65 | bwd_inner_microstep: 5025.36 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-07-29 21:57:47,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.23 | bwd_microstep: 5193.84 | bwd_inner_microstep: 4788.38 | bwd_allreduce_microstep: 405.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-29 21:57:56,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.43 | bwd_microstep: 4983.61 | bwd_inner_microstep: 4964.15 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-29 21:58:05,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.31 | bwd_microstep: 5070.70 | bwd_inner_microstep: 5014.90 | bwd_allreduce_microstep: 55.73 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-29 21:58:13,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3717.32 | bwd_microstep: 4934.03 | bwd_inner_microstep: 4911.09 | bwd_allreduce_microstep: 22.88 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-29 21:58:22,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-29 21:58:22,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3227.08 | bwd_microstep: 4795.56 | bwd_inner_microstep: 4776.18 | bwd_allreduce_microstep: 19.31 | step_microstep: 182.13 [2024-07-29 21:58:22,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27899.09 | bwd: 40067.76 | bwd_inner: 39108.59 | bwd_allreduce: 958.70 | step: 182.71 78%|███████▊ | 526/671 [10:15:08<2:47:41, 69.39s/it] {'loss': 1.1295, 'learning_rate': 2.3568427945415196e-06, 'epoch': 0.78} 78%|███████▊ | 526/671 [10:15:08<2:47:41, 69.39s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3557 [2024-07-29 21:58:31,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.20 | bwd_microstep: 5482.14 | bwd_inner_microstep: 5385.40 | bwd_allreduce_microstep: 96.68 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2056 [2024-07-29 21:58:39,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3051.76 | bwd_microstep: 5092.55 | bwd_inner_microstep: 4700.20 | bwd_allreduce_microstep: 392.28 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3781 [2024-07-29 21:58:48,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.35 | bwd_microstep: 5141.85 | bwd_inner_microstep: 5090.91 | bwd_allreduce_microstep: 50.87 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2240 [2024-07-29 21:58:56,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.35 | bwd_microstep: 5149.28 | bwd_inner_microstep: 4747.11 | bwd_allreduce_microstep: 402.10 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3684 [2024-07-29 21:59:05,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.87 | bwd_microstep: 5155.83 | bwd_inner_microstep: 5065.32 | bwd_allreduce_microstep: 90.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-29 21:59:14,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.75 | bwd_microstep: 5014.75 | bwd_inner_microstep: 4962.59 | bwd_allreduce_microstep: 52.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-29 21:59:22,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2983.22 | bwd_microstep: 4851.64 | bwd_inner_microstep: 4477.77 | bwd_allreduce_microstep: 373.79 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-29 21:59:30,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 21:59:30,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.14 | bwd_microstep: 5228.28 | bwd_inner_microstep: 4820.66 | bwd_allreduce_microstep: 407.55 | step_microstep: 182.49 [2024-07-29 21:59:30,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27462.54 | bwd: 41116.28 | bwd_inner: 39249.90 | bwd_allreduce: 1865.90 | step: 183.19 79%|███████▊ | 527/671 [10:16:16<2:46:10, 69.24s/it] {'loss': 1.1727, 'learning_rate': 2.3257655484679376e-06, 'epoch': 0.78} 79%|███████▊ | 527/671 [10:16:16<2:46:10, 69.24s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3908 [2024-07-29 21:59:39,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3794.47 | bwd_microstep: 5158.80 | bwd_inner_microstep: 5139.74 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2056 [2024-07-29 21:59:48,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3103.46 | bwd_microstep: 5176.36 | bwd_inner_microstep: 4780.08 | bwd_allreduce_microstep: 396.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2311 [2024-07-29 21:59:57,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.11 | bwd_microstep: 5234.36 | bwd_inner_microstep: 4826.49 | bwd_allreduce_microstep: 407.80 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-29 22:00:05,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.17 | bwd_microstep: 4984.90 | bwd_inner_microstep: 4965.57 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3736 [2024-07-29 22:00:14,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.59 | bwd_microstep: 5221.55 | bwd_inner_microstep: 5157.35 | bwd_allreduce_microstep: 64.13 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-29 22:00:23,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.45 | bwd_microstep: 5196.00 | bwd_inner_microstep: 4790.67 | bwd_allreduce_microstep: 405.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-29 22:00:32,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.18 | bwd_microstep: 5184.31 | bwd_inner_microstep: 5108.30 | bwd_allreduce_microstep: 75.95 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-29 22:00:40,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 22:00:40,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3039.93 | bwd_microstep: 4937.69 | bwd_inner_microstep: 4560.05 | bwd_allreduce_microstep: 377.57 | step_microstep: 181.21 [2024-07-29 22:00:40,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28010.27 | bwd: 41093.94 | bwd_inner: 39328.19 | bwd_allreduce: 1765.27 | step: 181.81 79%|███████▊ | 528/671 [10:17:26<2:45:09, 69.30s/it] {'loss': 1.1532, 'learning_rate': 2.2948675722421086e-06, 'epoch': 0.79} 79%|███████▊ | 528/671 [10:17:26<2:45:09, 69.30s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2377 [2024-07-29 22:00:49,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.55 | bwd_microstep: 5147.49 | bwd_inner_microstep: 4751.21 | bwd_allreduce_microstep: 396.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2257 [2024-07-29 22:00:57,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.22 | bwd_microstep: 5168.53 | bwd_inner_microstep: 4767.53 | bwd_allreduce_microstep: 400.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-29 22:01:06,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.35 | bwd_microstep: 5213.38 | bwd_inner_microstep: 5130.19 | bwd_allreduce_microstep: 83.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-29 22:01:15,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.14 | bwd_microstep: 5186.35 | bwd_inner_microstep: 5109.84 | bwd_allreduce_microstep: 76.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-29 22:01:24,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.66 | bwd_microstep: 5153.57 | bwd_inner_microstep: 5070.35 | bwd_allreduce_microstep: 83.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-29 22:01:33,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.96 | bwd_microstep: 5228.56 | bwd_inner_microstep: 4822.32 | bwd_allreduce_microstep: 406.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-29 22:01:41,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.86 | bwd_microstep: 5147.03 | bwd_inner_microstep: 5072.60 | bwd_allreduce_microstep: 74.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-29 22:01:50,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-29 22:01:50,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.14 | bwd_microstep: 5063.85 | bwd_inner_microstep: 5007.33 | bwd_allreduce_microstep: 56.44 | step_microstep: 370.40 [2024-07-29 22:01:50,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28624.78 | bwd: 41308.73 | bwd_inner: 39731.31 | bwd_allreduce: 1576.95 | step: 370.98 79%|███████▉ | 529/671 [10:18:36<2:44:49, 69.64s/it] {'loss': 1.1731, 'learning_rate': 2.264149587639668e-06, 'epoch': 0.79} 79%|███████▉ | 529/671 [10:18:36<2:44:49, 69.64s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3880 [2024-07-29 22:01:59,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.77 | bwd_microstep: 5177.49 | bwd_inner_microstep: 5137.92 | bwd_allreduce_microstep: 39.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-29 22:02:08,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.18 | bwd_microstep: 5297.39 | bwd_inner_microstep: 5198.01 | bwd_allreduce_microstep: 99.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2305 [2024-07-29 22:02:17,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.77 | bwd_microstep: 5264.66 | bwd_inner_microstep: 4858.17 | bwd_allreduce_microstep: 406.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3750 [2024-07-29 22:02:26,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.23 | bwd_microstep: 4995.50 | bwd_inner_microstep: 4976.10 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-29 22:02:35,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.69 | bwd_microstep: 4994.77 | bwd_inner_microstep: 4975.18 | bwd_allreduce_microstep: 19.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-29 22:02:43,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.85 | bwd_microstep: 5202.06 | bwd_inner_microstep: 4797.95 | bwd_allreduce_microstep: 404.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-29 22:02:52,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3502.46 | bwd_microstep: 5090.70 | bwd_inner_microstep: 4692.83 | bwd_allreduce_microstep: 397.81 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2114 [2024-07-29 22:03:01,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 22:03:01,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.66 | bwd_microstep: 5134.94 | bwd_inner_microstep: 4738.62 | bwd_allreduce_microstep: 396.23 | step_microstep: 180.75 [2024-07-29 22:03:01,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28937.52 | bwd: 41157.48 | bwd_inner: 39374.70 | bwd_allreduce: 1782.29 | step: 181.31 79%|███████▉ | 530/671 [10:19:47<2:44:12, 69.88s/it] {'loss': 1.1585, 'learning_rate': 2.2336123122316642e-06, 'epoch': 0.79} 79%|███████▉ | 530/671 [10:19:47<2:44:12, 69.88s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2326 [2024-07-29 22:03:10,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.41 | bwd_microstep: 5388.81 | bwd_inner_microstep: 4971.58 | bwd_allreduce_microstep: 417.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3875 [2024-07-29 22:03:19,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3778.46 | bwd_microstep: 5114.65 | bwd_inner_microstep: 5095.39 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-29 22:03:28,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.37 | bwd_microstep: 5234.02 | bwd_inner_microstep: 5137.90 | bwd_allreduce_microstep: 96.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-29 22:03:36,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3715.43 | bwd_microstep: 4978.92 | bwd_inner_microstep: 4959.47 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-29 22:03:45,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.78 | bwd_microstep: 5002.71 | bwd_inner_microstep: 4983.36 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3634 [2024-07-29 22:03:54,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.78 | bwd_microstep: 5160.54 | bwd_inner_microstep: 5084.03 | bwd_allreduce_microstep: 76.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 22:04:03,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.97 | bwd_microstep: 5025.90 | bwd_inner_microstep: 5000.08 | bwd_allreduce_microstep: 25.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-29 22:04:12,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-29 22:04:12,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.75 | bwd_microstep: 4991.00 | bwd_inner_microstep: 4971.68 | bwd_allreduce_microstep: 19.26 | step_microstep: 181.05 [2024-07-29 22:04:12,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29614.85 | bwd: 40896.53 | bwd_inner: 40203.44 | bwd_allreduce: 692.61 | step: 181.63 79%|███████▉ | 531/671 [10:20:58<2:43:43, 70.17s/it] {'loss': 1.0844, 'learning_rate': 2.2032564593677773e-06, 'epoch': 0.79} 79%|███████▉ | 531/671 [10:20:58<2:43:43, 70.17s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3946 [2024-07-29 22:04:21,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3824.53 | bwd_microstep: 5171.19 | bwd_inner_microstep: 5152.02 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3813 [2024-07-29 22:04:30,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.74 | bwd_microstep: 5063.06 | bwd_inner_microstep: 5039.44 | bwd_allreduce_microstep: 23.56 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2043 [2024-07-29 22:04:38,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.39 | bwd_microstep: 5275.49 | bwd_inner_microstep: 4865.37 | bwd_allreduce_microstep: 410.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-29 22:04:47,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.55 | bwd_microstep: 5030.60 | bwd_inner_microstep: 5003.28 | bwd_allreduce_microstep: 27.26 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-07-29 22:04:56,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.80 | bwd_microstep: 4992.24 | bwd_inner_microstep: 4972.91 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-29 22:05:05,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.20 | bwd_microstep: 5172.65 | bwd_inner_microstep: 5114.81 | bwd_allreduce_microstep: 57.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-29 22:05:13,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.40 | bwd_microstep: 4905.80 | bwd_inner_microstep: 4886.36 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 22:05:22,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.70 [2024-07-29 22:05:22,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.10 | bwd_microstep: 4963.55 | bwd_inner_microstep: 4919.86 | bwd_allreduce_microstep: 43.62 | step_microstep: 181.00 [2024-07-29 22:05:22,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29440.62 | bwd: 40574.56 | bwd_inner: 39953.99 | bwd_allreduce: 620.09 | step: 181.69 79%|███████▉ | 532/671 [10:22:08<2:42:40, 70.22s/it] {'loss': 1.1697, 'learning_rate': 2.1730827381596677e-06, 'epoch': 0.79} 79%|███████▉ | 532/671 [10:22:08<2:42:40, 70.22s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3934 [2024-07-29 22:05:31,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3841.98 | bwd_microstep: 5157.05 | bwd_inner_microstep: 5137.88 | bwd_allreduce_microstep: 19.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-29 22:05:40,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.69 | bwd_microstep: 5050.21 | bwd_inner_microstep: 5023.83 | bwd_allreduce_microstep: 26.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3763 [2024-07-29 22:05:49,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3755.52 | bwd_microstep: 5006.23 | bwd_inner_microstep: 4986.91 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2162 [2024-07-29 22:05:57,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.33 | bwd_microstep: 5211.23 | bwd_inner_microstep: 4805.56 | bwd_allreduce_microstep: 405.60 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-29 22:06:06,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.31 | bwd_microstep: 5113.76 | bwd_inner_microstep: 5048.31 | bwd_allreduce_microstep: 65.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-29 22:06:15,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.06 | bwd_microstep: 4965.33 | bwd_inner_microstep: 4918.04 | bwd_allreduce_microstep: 47.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-29 22:06:23,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.62 | bwd_microstep: 5011.59 | bwd_inner_microstep: 4961.66 | bwd_allreduce_microstep: 49.87 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-29 22:06:31,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 22:06:31,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3025.32 | bwd_microstep: 4914.45 | bwd_inner_microstep: 4534.08 | bwd_allreduce_microstep: 380.30 | step_microstep: 181.76 [2024-07-29 22:06:31,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28602.75 | bwd: 40429.83 | bwd_inner: 39416.23 | bwd_allreduce: 1013.14 | step: 182.34 79%|███████▉ | 533/671 [10:23:17<2:40:54, 69.96s/it] {'loss': 1.107, 'learning_rate': 2.1430918534643996e-06, 'epoch': 0.79} 79%|███████▉ | 533/671 [10:23:17<2:40:54, 69.96s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3905 [2024-07-29 22:06:40,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.44 | bwd_microstep: 5221.89 | bwd_inner_microstep: 5184.71 | bwd_allreduce_microstep: 37.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3857 [2024-07-29 22:06:49,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3649.11 | bwd_microstep: 5098.41 | bwd_inner_microstep: 5060.70 | bwd_allreduce_microstep: 37.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-29 22:06:58,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3544.86 | bwd_microstep: 5117.63 | bwd_inner_microstep: 5048.65 | bwd_allreduce_microstep: 68.92 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-29 22:07:06,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.34 | bwd_microstep: 5022.10 | bwd_inner_microstep: 4999.64 | bwd_allreduce_microstep: 22.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-07-29 22:07:15,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.47 | bwd_microstep: 5287.11 | bwd_inner_microstep: 4878.79 | bwd_allreduce_microstep: 408.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-29 22:07:24,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.55 | bwd_microstep: 4987.43 | bwd_inner_microstep: 4968.08 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-29 22:07:33,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.75 | bwd_microstep: 4987.57 | bwd_inner_microstep: 4968.26 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-29 22:07:42,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 22:07:42,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.66 | bwd_microstep: 5148.46 | bwd_inner_microstep: 5078.43 | bwd_allreduce_microstep: 69.96 | step_microstep: 181.17 [2024-07-29 22:07:42,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29233.10 | bwd: 40870.58 | bwd_inner: 40187.20 | bwd_allreduce: 682.91 | step: 181.74 80%|███████▉ | 534/671 [10:24:28<2:40:04, 70.10s/it] {'loss': 1.1906, 'learning_rate': 2.1132845058679917e-06, 'epoch': 0.79} 80%|███████▉ | 534/671 [10:24:28<2:40:04, 70.10s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3940 [2024-07-29 22:07:51,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.33 | bwd_microstep: 5269.66 | bwd_inner_microstep: 5223.86 | bwd_allreduce_microstep: 45.74 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3860 [2024-07-29 22:07:59,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3287.64 | bwd_microstep: 4911.12 | bwd_inner_microstep: 4891.76 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-07-29 22:08:08,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.32 | bwd_microstep: 5144.17 | bwd_inner_microstep: 5090.34 | bwd_allreduce_microstep: 53.76 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2068 [2024-07-29 22:08:17,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.47 | bwd_microstep: 5219.35 | bwd_inner_microstep: 4814.00 | bwd_allreduce_microstep: 405.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-29 22:08:25,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.95 | bwd_microstep: 5052.92 | bwd_inner_microstep: 5010.73 | bwd_allreduce_microstep: 42.13 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-29 22:08:34,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.95 | bwd_microstep: 5016.95 | bwd_inner_microstep: 4958.73 | bwd_allreduce_microstep: 58.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-29 22:08:42,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2999.97 | bwd_microstep: 4861.63 | bwd_inner_microstep: 4486.18 | bwd_allreduce_microstep: 375.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2133 [2024-07-29 22:08:50,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.42 [2024-07-29 22:08:50,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.94 | bwd_microstep: 5113.50 | bwd_inner_microstep: 4714.99 | bwd_allreduce_microstep: 398.45 | step_microstep: 180.70 [2024-07-29 22:08:50,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27787.48 | bwd: 40589.27 | bwd_inner: 39190.53 | bwd_allreduce: 1398.28 | step: 181.25 80%|███████▉ | 535/671 [10:25:36<2:37:57, 69.69s/it] {'loss': 1.1275, 'learning_rate': 2.083661391669043e-06, 'epoch': 0.8} 80%|███████▉ | 535/671 [10:25:36<2:37:57, 69.69s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-29 22:08:59,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3310.75 | bwd_microstep: 5087.26 | bwd_inner_microstep: 5016.43 | bwd_allreduce_microstep: 70.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3802 [2024-07-29 22:09:08,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.34 | bwd_microstep: 5024.94 | bwd_inner_microstep: 5005.63 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2072 [2024-07-29 22:09:16,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3522.79 | bwd_microstep: 5201.60 | bwd_inner_microstep: 4799.73 | bwd_allreduce_microstep: 401.81 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2174 [2024-07-29 22:09:25,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.83 | bwd_microstep: 5138.49 | bwd_inner_microstep: 4738.35 | bwd_allreduce_microstep: 400.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-29 22:09:34,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.05 | bwd_microstep: 5141.14 | bwd_inner_microstep: 5085.07 | bwd_allreduce_microstep: 56.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-29 22:09:43,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3728.42 | bwd_microstep: 4990.75 | bwd_inner_microstep: 4971.37 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-29 22:09:51,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.66 | bwd_microstep: 5032.91 | bwd_inner_microstep: 4997.07 | bwd_allreduce_microstep: 35.77 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-29 22:10:00,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 22:10:00,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.07 | bwd_microstep: 5064.61 | bwd_inner_microstep: 5005.72 | bwd_allreduce_microstep: 58.83 | step_microstep: 181.87 [2024-07-29 22:10:00,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28764.81 | bwd: 40681.68 | bwd_inner: 39619.31 | bwd_allreduce: 1061.90 | step: 182.47 80%|███████▉ | 536/671 [10:26:46<2:36:51, 69.71s/it] {'loss': 1.0969, 'learning_rate': 2.0542232028624585e-06, 'epoch': 0.8} 80%|███████▉ | 536/671 [10:26:46<2:36:51, 69.71s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3963 [2024-07-29 22:10:09,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.44 | bwd_microstep: 5248.41 | bwd_inner_microstep: 5203.47 | bwd_allreduce_microstep: 44.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3573 [2024-07-29 22:10:18,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.55 | bwd_microstep: 5149.93 | bwd_inner_microstep: 5067.64 | bwd_allreduce_microstep: 82.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-29 22:10:27,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.63 | bwd_microstep: 5151.85 | bwd_inner_microstep: 5077.05 | bwd_allreduce_microstep: 74.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-29 22:10:35,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.95 | bwd_microstep: 4991.43 | bwd_inner_microstep: 4972.12 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-29 22:10:44,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3769.69 | bwd_microstep: 5019.36 | bwd_inner_microstep: 4993.64 | bwd_allreduce_microstep: 25.65 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-29 22:10:53,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.56 | bwd_microstep: 5045.98 | bwd_inner_microstep: 5006.37 | bwd_allreduce_microstep: 39.55 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-29 22:11:02,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3509.86 | bwd_microstep: 5125.88 | bwd_inner_microstep: 4728.49 | bwd_allreduce_microstep: 397.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3683 [2024-07-29 22:11:10,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 22:11:10,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3688.21 | bwd_microstep: 4896.29 | bwd_inner_microstep: 4876.91 | bwd_allreduce_microstep: 19.32 | step_microstep: 180.85 [2024-07-29 22:11:10,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29171.79 | bwd: 40629.11 | bwd_inner: 39925.61 | bwd_allreduce: 703.02 | step: 181.46 80%|████████ | 537/671 [10:27:56<2:35:58, 69.84s/it] {'loss': 1.1445, 'learning_rate': 2.024970627123297e-06, 'epoch': 0.8} 80%|████████ | 537/671 [10:27:56<2:35:58, 69.84s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3855 [2024-07-29 22:11:20,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.99 | bwd_microstep: 5581.79 | bwd_inner_microstep: 5533.50 | bwd_allreduce_microstep: 48.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-29 22:11:28,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3387.38 | bwd_microstep: 5067.85 | bwd_inner_microstep: 5006.71 | bwd_allreduce_microstep: 61.08 | step_microstep: 0.19 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3616 [2024-07-29 22:11:37,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.71 | bwd_microstep: 5176.75 | bwd_inner_microstep: 5074.82 | bwd_allreduce_microstep: 101.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3794 [2024-07-29 22:11:45,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3130.37 | bwd_microstep: 5000.73 | bwd_inner_microstep: 4955.31 | bwd_allreduce_microstep: 45.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-29 22:11:54,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.73 | bwd_microstep: 5173.03 | bwd_inner_microstep: 5117.20 | bwd_allreduce_microstep: 55.76 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3647 [2024-07-29 22:12:02,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.99 | bwd_microstep: 5005.21 | bwd_inner_microstep: 4927.75 | bwd_allreduce_microstep: 77.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3692 [2024-07-29 22:12:11,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3694.70 | bwd_microstep: 4911.73 | bwd_inner_microstep: 4892.36 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2176 [2024-07-29 22:12:20,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 22:12:20,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3488.93 | bwd_microstep: 5065.96 | bwd_inner_microstep: 4673.27 | bwd_allreduce_microstep: 392.63 | step_microstep: 180.93 [2024-07-29 22:12:20,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28140.71 | bwd: 40983.03 | bwd_inner: 40180.85 | bwd_allreduce: 801.70 | step: 181.60 80%|████████ | 538/671 [10:29:06<2:34:33, 69.72s/it] {'loss': 1.0651, 'learning_rate': 1.9959043477907e-06, 'epoch': 0.8} 80%|████████ | 538/671 [10:29:06<2:34:33, 69.72s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3921 [2024-07-29 22:12:29,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.28 | bwd_microstep: 5155.36 | bwd_inner_microstep: 5104.51 | bwd_allreduce_microstep: 50.79 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2050 [2024-07-29 22:12:37,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.47 | bwd_microstep: 5223.42 | bwd_inner_microstep: 4818.44 | bwd_allreduce_microstep: 404.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-29 22:12:46,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.71 | bwd_microstep: 5125.27 | bwd_inner_microstep: 5051.36 | bwd_allreduce_microstep: 73.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-29 22:12:55,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.72 | bwd_microstep: 5210.20 | bwd_inner_microstep: 4805.30 | bwd_allreduce_microstep: 404.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-29 22:13:04,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.86 | bwd_microstep: 5014.07 | bwd_inner_microstep: 4960.78 | bwd_allreduce_microstep: 53.23 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3732 [2024-07-29 22:13:12,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.57 | bwd_microstep: 4982.93 | bwd_inner_microstep: 4963.54 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-29 22:13:21,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.66 | bwd_microstep: 5011.67 | bwd_inner_microstep: 4975.17 | bwd_allreduce_microstep: 36.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-29 22:13:30,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 22:13:30,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.75 | bwd_microstep: 4998.02 | bwd_inner_microstep: 4946.68 | bwd_allreduce_microstep: 51.28 | step_microstep: 181.66 [2024-07-29 22:13:30,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28690.92 | bwd: 40720.93 | bwd_inner: 39625.72 | bwd_allreduce: 1094.74 | step: 182.22 80%|████████ | 539/671 [10:30:16<2:33:24, 69.73s/it] {'loss': 1.0988, 'learning_rate': 1.967025043851939e-06, 'epoch': 0.8} 80%|████████ | 539/671 [10:30:16<2:33:24, 69.73s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3940 [2024-07-29 22:13:38,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3692.08 | bwd_microstep: 5139.44 | bwd_inner_microstep: 5102.15 | bwd_allreduce_microstep: 37.23 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3569 [2024-07-29 22:13:46,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3177.38 | bwd_microstep: 4690.64 | bwd_inner_microstep: 4658.45 | bwd_allreduce_microstep: 32.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-29 22:13:55,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.45 | bwd_microstep: 5080.01 | bwd_inner_microstep: 5034.93 | bwd_allreduce_microstep: 45.02 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-29 22:14:04,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.48 | bwd_microstep: 5230.84 | bwd_inner_microstep: 4825.71 | bwd_allreduce_microstep: 405.07 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3622 [2024-07-29 22:14:13,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.46 | bwd_microstep: 5182.28 | bwd_inner_microstep: 5078.64 | bwd_allreduce_microstep: 103.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-29 22:14:21,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3710.74 | bwd_microstep: 4913.99 | bwd_inner_microstep: 4890.91 | bwd_allreduce_microstep: 23.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-29 22:14:30,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.75 | bwd_microstep: 4981.62 | bwd_inner_microstep: 4929.55 | bwd_allreduce_microstep: 52.00 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3686 [2024-07-29 22:14:38,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 22:14:38,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3110.88 | bwd_microstep: 4963.28 | bwd_inner_microstep: 4911.50 | bwd_allreduce_microstep: 51.72 | step_microstep: 181.17 [2024-07-29 22:14:38,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27974.14 | bwd: 40182.10 | bwd_inner: 39431.77 | bwd_allreduce: 749.86 | step: 181.73 80%|████████ | 540/671 [10:31:24<2:31:25, 69.35s/it] {'loss': 1.0866, 'learning_rate': 1.9383333899265368e-06, 'epoch': 0.8} 80%|████████ | 540/671 [10:31:24<2:31:25, 69.35s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3860 [2024-07-29 22:14:47,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3782.71 | bwd_microstep: 5108.75 | bwd_inner_microstep: 5089.62 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3810 [2024-07-29 22:14:56,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.86 | bwd_microstep: 5044.46 | bwd_inner_microstep: 5025.10 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2200 [2024-07-29 22:15:05,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.41 | bwd_microstep: 5211.84 | bwd_inner_microstep: 4805.97 | bwd_allreduce_microstep: 405.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-29 22:15:13,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.50 | bwd_microstep: 5136.25 | bwd_inner_microstep: 5060.99 | bwd_allreduce_microstep: 75.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-29 22:15:22,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.99 | bwd_microstep: 4987.56 | bwd_inner_microstep: 4968.16 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-29 22:15:31,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3550.77 | bwd_microstep: 5210.65 | bwd_inner_microstep: 4805.50 | bwd_allreduce_microstep: 405.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-29 22:15:40,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3712.62 | bwd_microstep: 4989.28 | bwd_inner_microstep: 4969.81 | bwd_allreduce_microstep: 19.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-29 22:15:48,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-29 22:15:48,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3692.11 | bwd_microstep: 4892.87 | bwd_inner_microstep: 4873.49 | bwd_allreduce_microstep: 19.30 | step_microstep: 182.04 [2024-07-29 22:15:48,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29327.88 | bwd: 40581.63 | bwd_inner: 39598.59 | bwd_allreduce: 982.57 | step: 182.61 81%|████████ | 541/671 [10:32:34<2:30:50, 69.62s/it] {'loss': 1.1574, 'learning_rate': 1.9098300562505266e-06, 'epoch': 0.81} 81%|████████ | 541/671 [10:32:34<2:30:50, 69.62s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3781 [2024-07-29 22:15:57,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.32 | bwd_microstep: 5132.30 | bwd_inner_microstep: 5092.29 | bwd_allreduce_microstep: 39.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-29 22:16:06,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.67 | bwd_microstep: 5074.45 | bwd_inner_microstep: 5013.31 | bwd_allreduce_microstep: 61.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-29 22:16:15,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.82 | bwd_microstep: 5196.85 | bwd_inner_microstep: 5120.71 | bwd_allreduce_microstep: 76.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2232 [2024-07-29 22:16:23,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3054.90 | bwd_microstep: 5046.03 | bwd_inner_microstep: 4656.91 | bwd_allreduce_microstep: 389.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3746 [2024-07-29 22:16:31,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.77 | bwd_microstep: 5026.64 | bwd_inner_microstep: 4989.82 | bwd_allreduce_microstep: 36.76 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3702 [2024-07-29 22:16:40,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3531.72 | bwd_microstep: 4987.50 | bwd_inner_microstep: 4930.89 | bwd_allreduce_microstep: 56.54 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3745 [2024-07-29 22:16:48,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3184.62 | bwd_microstep: 4771.38 | bwd_inner_microstep: 4751.99 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-29 22:16:57,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 22:16:57,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.85 | bwd_microstep: 5066.09 | bwd_inner_microstep: 5005.49 | bwd_allreduce_microstep: 60.53 | step_microstep: 180.74 [2024-07-29 22:16:57,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27678.57 | bwd: 40301.22 | bwd_inner: 39561.35 | bwd_allreduce: 739.39 | step: 181.32 81%|████████ | 542/671 [10:33:43<2:28:50, 69.23s/it] {'loss': 1.1884, 'learning_rate': 1.8815157086607826e-06, 'epoch': 0.81} 81%|████████ | 542/671 [10:33:43<2:28:50, 69.23s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2285 [2024-07-29 22:17:06,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.79 | bwd_microstep: 5296.80 | bwd_inner_microstep: 4887.43 | bwd_allreduce_microstep: 409.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2320 [2024-07-29 22:17:14,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.86 | bwd_microstep: 5261.39 | bwd_inner_microstep: 4850.25 | bwd_allreduce_microstep: 411.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3765 [2024-07-29 22:17:23,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.96 | bwd_microstep: 5177.63 | bwd_inner_microstep: 5125.04 | bwd_allreduce_microstep: 52.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-29 22:17:32,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.83 | bwd_microstep: 5100.96 | bwd_inner_microstep: 5028.32 | bwd_allreduce_microstep: 72.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3735 [2024-07-29 22:17:41,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.47 | bwd_microstep: 5124.83 | bwd_inner_microstep: 5051.53 | bwd_allreduce_microstep: 73.23 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-29 22:17:49,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.95 | bwd_microstep: 5031.69 | bwd_inner_microstep: 5007.83 | bwd_allreduce_microstep: 23.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-29 22:17:58,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.35 | bwd_microstep: 5096.39 | bwd_inner_microstep: 5035.74 | bwd_allreduce_microstep: 60.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-29 22:18:07,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 22:18:07,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3489.15 | bwd_microstep: 5071.62 | bwd_inner_microstep: 4679.56 | bwd_allreduce_microstep: 391.98 | step_microstep: 181.91 [2024-07-29 22:18:07,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28743.27 | bwd: 41161.31 | bwd_inner: 39665.65 | bwd_allreduce: 1495.19 | step: 182.49 81%|████████ | 543/671 [10:34:53<2:28:19, 69.53s/it] {'loss': 1.1106, 'learning_rate': 1.8533910085794714e-06, 'epoch': 0.81} 81%|████████ | 543/671 [10:34:53<2:28:19, 69.53s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3532 [2024-07-29 22:18:16,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.94 | bwd_microstep: 5282.14 | bwd_inner_microstep: 5182.02 | bwd_allreduce_microstep: 100.06 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3872 [2024-07-29 22:18:24,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3235.73 | bwd_microstep: 4940.06 | bwd_inner_microstep: 4918.66 | bwd_allreduce_microstep: 21.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3804 [2024-07-29 22:18:33,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.21 | bwd_microstep: 5051.36 | bwd_inner_microstep: 5032.15 | bwd_allreduce_microstep: 19.15 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3625 [2024-07-29 22:18:42,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.07 | bwd_microstep: 5260.29 | bwd_inner_microstep: 5151.46 | bwd_allreduce_microstep: 108.77 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-29 22:18:50,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.81 | bwd_microstep: 4892.60 | bwd_inner_microstep: 4873.28 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2192 [2024-07-29 22:18:59,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.76 | bwd_microstep: 5121.18 | bwd_inner_microstep: 4723.54 | bwd_allreduce_microstep: 397.58 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2138 [2024-07-29 22:19:08,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3460.67 | bwd_microstep: 5048.13 | bwd_inner_microstep: 4656.98 | bwd_allreduce_microstep: 391.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-29 22:19:16,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 22:19:16,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3534.88 | bwd_microstep: 5006.82 | bwd_inner_microstep: 4957.04 | bwd_allreduce_microstep: 49.71 | step_microstep: 181.45 [2024-07-29 22:19:16,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28504.97 | bwd: 40602.57 | bwd_inner: 39495.07 | bwd_allreduce: 1107.03 | step: 182.13 81%|████████ | 544/671 [10:36:02<2:27:06, 69.50s/it] {'loss': 1.138, 'learning_rate': 1.8254566129985996e-06, 'epoch': 0.81} 81%|████████ | 544/671 [10:36:02<2:27:06, 69.50s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3588 [2024-07-29 22:19:24,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3109.60 | bwd_microstep: 5054.28 | bwd_inner_microstep: 4894.32 | bwd_allreduce_microstep: 159.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3808 [2024-07-29 22:19:33,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.81 | bwd_microstep: 5027.45 | bwd_inner_microstep: 5008.12 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3802 [2024-07-29 22:19:42,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.79 | bwd_microstep: 5355.55 | bwd_inner_microstep: 5293.44 | bwd_allreduce_microstep: 62.04 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3806 [2024-07-29 22:19:51,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.24 | bwd_microstep: 5112.33 | bwd_inner_microstep: 5055.48 | bwd_allreduce_microstep: 56.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-29 22:19:59,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3218.22 | bwd_microstep: 4780.42 | bwd_inner_microstep: 4744.79 | bwd_allreduce_microstep: 35.57 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-29 22:20:08,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3477.87 | bwd_microstep: 5044.36 | bwd_inner_microstep: 4651.76 | bwd_allreduce_microstep: 392.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-29 22:20:16,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3239.32 | bwd_microstep: 4874.70 | bwd_inner_microstep: 4849.37 | bwd_allreduce_microstep: 25.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-29 22:20:25,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-29 22:20:25,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.08 | bwd_microstep: 4959.50 | bwd_inner_microstep: 4931.65 | bwd_allreduce_microstep: 27.78 | step_microstep: 460.23 [2024-07-29 22:20:25,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27700.83 | bwd: 40208.58 | bwd_inner: 39428.87 | bwd_allreduce: 779.24 | step: 460.82 81%|████████ | 545/671 [10:37:11<2:25:19, 69.20s/it] {'loss': 1.1997, 'learning_rate': 1.7977131744646692e-06, 'epoch': 0.81} 81%|████████ | 545/671 [10:37:11<2:25:19, 69.20s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3918 [2024-07-29 22:20:33,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3297.81 | bwd_microstep: 4983.99 | bwd_inner_microstep: 4964.93 | bwd_allreduce_microstep: 18.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3588 [2024-07-29 22:20:41,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3189.49 | bwd_microstep: 4661.21 | bwd_inner_microstep: 4639.24 | bwd_allreduce_microstep: 21.91 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2236 [2024-07-29 22:20:50,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.80 | bwd_microstep: 5192.38 | bwd_inner_microstep: 4790.87 | bwd_allreduce_microstep: 401.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-29 22:20:59,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.85 | bwd_microstep: 5213.10 | bwd_inner_microstep: 4808.66 | bwd_allreduce_microstep: 404.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-29 22:21:07,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.57 | bwd_microstep: 5102.39 | bwd_inner_microstep: 5034.88 | bwd_allreduce_microstep: 67.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-29 22:21:16,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.47 | bwd_microstep: 4903.88 | bwd_inner_microstep: 4884.27 | bwd_allreduce_microstep: 19.52 | step_microstep: 0.11 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2133 [2024-07-29 22:21:24,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3472.27 | bwd_microstep: 5057.44 | bwd_inner_microstep: 4665.48 | bwd_allreduce_microstep: 391.90 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2164 [2024-07-29 22:21:33,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 22:21:33,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.83 | bwd_microstep: 5244.52 | bwd_inner_microstep: 4837.93 | bwd_allreduce_microstep: 406.52 | step_microstep: 180.83 [2024-07-29 22:21:33,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27877.98 | bwd: 40358.89 | bwd_inner: 38626.20 | bwd_allreduce: 1732.20 | step: 181.44 81%|████████▏ | 546/671 [10:38:19<2:23:46, 69.01s/it] {'loss': 1.1939, 'learning_rate': 1.7701613410634367e-06, 'epoch': 0.81} 81%|████████▏ | 546/671 [10:38:19<2:23:46, 69.01s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2330 [2024-07-29 22:21:42,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.23 | bwd_microstep: 5440.06 | bwd_inner_microstep: 5023.31 | bwd_allreduce_microstep: 416.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2280 [2024-07-29 22:21:51,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.68 | bwd_microstep: 5247.56 | bwd_inner_microstep: 4839.90 | bwd_allreduce_microstep: 407.60 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2245 [2024-07-29 22:22:00,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.34 | bwd_microstep: 5153.58 | bwd_inner_microstep: 4752.20 | bwd_allreduce_microstep: 401.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-29 22:22:09,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.17 | bwd_microstep: 4995.80 | bwd_inner_microstep: 4976.46 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-29 22:22:18,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.77 | bwd_microstep: 5054.85 | bwd_inner_microstep: 5030.28 | bwd_allreduce_microstep: 24.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-29 22:22:26,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.14 | bwd_microstep: 5141.90 | bwd_inner_microstep: 5087.33 | bwd_allreduce_microstep: 54.50 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2135 [2024-07-29 22:22:35,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.64 | bwd_microstep: 5124.10 | bwd_inner_microstep: 4726.38 | bwd_allreduce_microstep: 397.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-29 22:22:44,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 22:22:44,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.94 | bwd_microstep: 5050.15 | bwd_inner_microstep: 5007.33 | bwd_allreduce_microstep: 42.75 | step_microstep: 181.60 [2024-07-29 22:22:44,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29021.81 | bwd: 41207.98 | bwd_inner: 39443.15 | bwd_allreduce: 1764.36 | step: 182.18 82%|████████▏ | 547/671 [10:39:30<2:23:35, 69.48s/it] {'loss': 1.1176, 'learning_rate': 1.7428017564047594e-06, 'epoch': 0.81} 82%|████████▏ | 547/671 [10:39:30<2:23:35, 69.48s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-29 22:22:53,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.42 | bwd_microstep: 5277.85 | bwd_inner_microstep: 5193.95 | bwd_allreduce_microstep: 83.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3814 [2024-07-29 22:23:02,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.51 | bwd_microstep: 5039.73 | bwd_inner_microstep: 5020.40 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2266 [2024-07-29 22:23:11,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.86 | bwd_microstep: 5274.66 | bwd_inner_microstep: 4865.05 | bwd_allreduce_microstep: 409.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-29 22:23:19,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.14 | bwd_microstep: 5163.75 | bwd_inner_microstep: 5108.20 | bwd_allreduce_microstep: 55.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2201 [2024-07-29 22:23:28,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3475.60 | bwd_microstep: 5138.89 | bwd_inner_microstep: 4739.88 | bwd_allreduce_microstep: 398.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-29 22:23:37,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.55 | bwd_microstep: 5019.65 | bwd_inner_microstep: 4963.65 | bwd_allreduce_microstep: 55.93 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3657 [2024-07-29 22:23:45,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.96 | bwd_microstep: 5062.11 | bwd_inner_microstep: 4981.32 | bwd_allreduce_microstep: 80.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3665 [2024-07-29 22:23:54,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 22:23:54,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.45 | bwd_microstep: 4986.07 | bwd_inner_microstep: 4937.43 | bwd_allreduce_microstep: 48.57 | step_microstep: 190.12 [2024-07-29 22:23:54,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28729.41 | bwd: 40962.68 | bwd_inner: 39809.82 | bwd_allreduce: 1152.39 | step: 190.70 82%|████████▏ | 548/671 [10:40:40<2:22:46, 69.64s/it] {'loss': 1.1404, 'learning_rate': 1.7156350596075777e-06, 'epoch': 0.82} 82%|████████▏ | 548/671 [10:40:40<2:22:46, 69.64s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3891 [2024-07-29 22:24:03,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3816.23 | bwd_microstep: 5473.13 | bwd_inner_microstep: 5453.97 | bwd_allreduce_microstep: 19.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3596 [2024-07-29 22:24:12,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.40 | bwd_microstep: 5191.44 | bwd_inner_microstep: 5111.20 | bwd_allreduce_microstep: 80.16 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3773 [2024-07-29 22:24:21,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.32 | bwd_microstep: 5204.63 | bwd_inner_microstep: 5136.25 | bwd_allreduce_microstep: 68.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-29 22:24:30,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.87 | bwd_microstep: 4989.72 | bwd_inner_microstep: 4970.41 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-29 22:24:38,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.05 | bwd_microstep: 4999.52 | bwd_inner_microstep: 4980.12 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3718 [2024-07-29 22:24:47,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.25 | bwd_microstep: 5120.51 | bwd_inner_microstep: 5055.42 | bwd_allreduce_microstep: 65.03 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-29 22:24:56,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.97 | bwd_microstep: 5098.18 | bwd_inner_microstep: 5031.04 | bwd_allreduce_microstep: 67.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-07-29 22:25:05,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-29 22:25:05,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.41 | bwd_microstep: 4912.84 | bwd_inner_microstep: 4888.46 | bwd_allreduce_microstep: 24.31 | step_microstep: 183.47 [2024-07-29 22:25:05,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29382.41 | bwd: 40989.96 | bwd_inner: 40626.82 | bwd_allreduce: 362.66 | step: 184.05 82%|████████▏ | 549/671 [10:41:51<2:22:15, 69.96s/it] {'loss': 1.1449, 'learning_rate': 1.6886618852849723e-06, 'epoch': 0.82} 82%|████████▏ | 549/671 [10:41:51<2:22:15, 69.96s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3834 [2024-07-29 22:25:14,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.88 | bwd_microstep: 5056.06 | bwd_inner_microstep: 5037.01 | bwd_allreduce_microstep: 18.98 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2037 [2024-07-29 22:25:22,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.35 | bwd_microstep: 5345.01 | bwd_inner_microstep: 4929.85 | bwd_allreduce_microstep: 415.09 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3752 [2024-07-29 22:25:31,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.01 | bwd_microstep: 5042.54 | bwd_inner_microstep: 5016.65 | bwd_allreduce_microstep: 25.83 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-29 22:25:40,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.59 | bwd_microstep: 5217.01 | bwd_inner_microstep: 5127.10 | bwd_allreduce_microstep: 89.85 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-29 22:25:48,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3065.41 | bwd_microstep: 5037.81 | bwd_inner_microstep: 4651.06 | bwd_allreduce_microstep: 386.68 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-29 22:25:57,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.45 | bwd_microstep: 5159.42 | bwd_inner_microstep: 5104.26 | bwd_allreduce_microstep: 55.10 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3679 [2024-07-29 22:26:06,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.53 | bwd_microstep: 5164.17 | bwd_inner_microstep: 5077.57 | bwd_allreduce_microstep: 86.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2155 [2024-07-29 22:26:15,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 22:26:15,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.05 | bwd_microstep: 5532.48 | bwd_inner_microstep: 4961.96 | bwd_allreduce_microstep: 570.45 | step_microstep: 181.80 [2024-07-29 22:26:15,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28568.18 | bwd: 41554.50 | bwd_inner: 39905.40 | bwd_allreduce: 1648.62 | step: 182.38 82%|████████▏ | 550/671 [10:43:01<2:21:23, 70.11s/it] {'loss': 1.1488, 'learning_rate': 1.6618828635293561e-06, 'epoch': 0.82} 82%|████████▏ | 550/671 [10:43:01<2:21:23, 70.11s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3716 [2024-07-29 22:26:24,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.01 | bwd_microstep: 5159.98 | bwd_inner_microstep: 5103.78 | bwd_allreduce_microstep: 56.12 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3580 [2024-07-29 22:26:33,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3545.00 | bwd_microstep: 5109.47 | bwd_inner_microstep: 5033.72 | bwd_allreduce_microstep: 75.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3607 [2024-07-29 22:26:41,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.23 | bwd_microstep: 5156.22 | bwd_inner_microstep: 5077.14 | bwd_allreduce_microstep: 79.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-29 22:26:50,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3783.85 | bwd_microstep: 5034.32 | bwd_inner_microstep: 5010.42 | bwd_allreduce_microstep: 23.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 22:26:58,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3223.97 | bwd_microstep: 4831.14 | bwd_inner_microstep: 4790.63 | bwd_allreduce_microstep: 40.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 22:27:06,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.79 | bwd_microstep: 4851.57 | bwd_inner_microstep: 4806.19 | bwd_allreduce_microstep: 45.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 22:27:15,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.88 | bwd_microstep: 5090.04 | bwd_inner_microstep: 5024.80 | bwd_allreduce_microstep: 65.17 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-29 22:27:23,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-29 22:27:23,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3229.06 | bwd_microstep: 4714.76 | bwd_inner_microstep: 4692.58 | bwd_allreduce_microstep: 22.12 | step_microstep: 181.24 [2024-07-29 22:27:23,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27814.72 | bwd: 39947.48 | bwd_inner: 39539.21 | bwd_allreduce: 407.79 | step: 181.92 82%|████████▏ | 551/671 [10:44:09<2:19:00, 69.50s/it] {'loss': 1.1442, 'learning_rate': 1.6352986198977327e-06, 'epoch': 0.82} 82%|████████▏ | 551/671 [10:44:09<2:19:00, 69.50s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3974 [2024-07-29 22:27:32,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3831.88 | bwd_microstep: 5226.74 | bwd_inner_microstep: 5207.49 | bwd_allreduce_microstep: 19.18 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2257 [2024-07-29 22:27:41,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.92 | bwd_microstep: 5266.87 | bwd_inner_microstep: 4856.68 | bwd_allreduce_microstep: 410.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2252 [2024-07-29 22:27:50,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.69 | bwd_microstep: 5248.14 | bwd_inner_microstep: 4840.16 | bwd_allreduce_microstep: 407.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3783 [2024-07-29 22:27:59,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.37 | bwd_microstep: 5116.40 | bwd_inner_microstep: 5070.84 | bwd_allreduce_microstep: 45.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3659 [2024-07-29 22:28:07,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.67 | bwd_microstep: 5088.76 | bwd_inner_microstep: 5026.17 | bwd_allreduce_microstep: 62.51 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3675 [2024-07-29 22:28:15,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3068.53 | bwd_microstep: 4892.54 | bwd_inner_microstep: 4845.74 | bwd_allreduce_microstep: 46.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-29 22:28:24,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.45 | bwd_microstep: 5097.58 | bwd_inner_microstep: 5032.72 | bwd_allreduce_microstep: 64.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-29 22:28:33,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 22:28:33,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.23 | bwd_microstep: 5396.01 | bwd_inner_microstep: 5173.76 | bwd_allreduce_microstep: 222.19 | step_microstep: 182.47 [2024-07-29 22:28:33,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28328.64 | bwd: 41333.02 | bwd_inner: 40053.51 | bwd_allreduce: 1279.02 | step: 183.06 82%|████████▏ | 552/671 [10:45:19<2:18:08, 69.65s/it] {'loss': 1.0947, 'learning_rate': 1.6089097753971061e-06, 'epoch': 0.82} 82%|████████▏ | 552/671 [10:45:19<2:18:08, 69.65s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3874 [2024-07-29 22:28:42,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3795.45 | bwd_microstep: 5142.59 | bwd_inner_microstep: 5123.42 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2269 [2024-07-29 22:28:51,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.95 | bwd_microstep: 5288.01 | bwd_inner_microstep: 4877.41 | bwd_allreduce_microstep: 410.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3791 [2024-07-29 22:29:00,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.69 | bwd_microstep: 5215.33 | bwd_inner_microstep: 5160.25 | bwd_allreduce_microstep: 55.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3786 [2024-07-29 22:29:09,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.19 | bwd_microstep: 5168.40 | bwd_inner_microstep: 5116.13 | bwd_allreduce_microstep: 52.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-07-29 22:29:17,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.83 | bwd_microstep: 5155.00 | bwd_inner_microstep: 4753.94 | bwd_allreduce_microstep: 400.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2209 [2024-07-29 22:29:26,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.21 | bwd_microstep: 5150.81 | bwd_inner_microstep: 4747.50 | bwd_allreduce_microstep: 403.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-29 22:29:35,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.30 | bwd_microstep: 5078.39 | bwd_inner_microstep: 5013.80 | bwd_allreduce_microstep: 64.53 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-29 22:29:43,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 22:29:43,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3207.91 | bwd_microstep: 4781.48 | bwd_inner_microstep: 4745.50 | bwd_allreduce_microstep: 35.91 | step_microstep: 180.72 [2024-07-29 22:29:43,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28465.44 | bwd: 40979.99 | bwd_inner: 39537.89 | bwd_allreduce: 1441.63 | step: 181.31 82%|████████▏ | 553/671 [10:46:29<2:17:02, 69.69s/it] {'loss': 1.1533, 'learning_rate': 1.5827169464699576e-06, 'epoch': 0.82} 82%|████████▏ | 553/671 [10:46:29<2:17:02, 69.69s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2316 [2024-07-29 22:29:51,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3114.02 | bwd_microstep: 5129.38 | bwd_inner_microstep: 4737.49 | bwd_allreduce_microstep: 391.82 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3809 [2024-07-29 22:30:00,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.32 | bwd_microstep: 5051.15 | bwd_inner_microstep: 5031.74 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3783 [2024-07-29 22:30:09,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.47 | bwd_microstep: 5139.46 | bwd_inner_microstep: 5094.40 | bwd_allreduce_microstep: 45.00 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3645 [2024-07-29 22:30:18,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.80 | bwd_microstep: 5126.72 | bwd_inner_microstep: 5039.59 | bwd_allreduce_microstep: 87.06 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3747 [2024-07-29 22:30:26,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.80 | bwd_microstep: 5108.19 | bwd_inner_microstep: 5036.26 | bwd_allreduce_microstep: 71.87 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3656 [2024-07-29 22:30:34,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3142.27 | bwd_microstep: 4932.49 | bwd_inner_microstep: 4882.42 | bwd_allreduce_microstep: 50.00 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3693 [2024-07-29 22:30:43,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.06 | bwd_microstep: 5084.78 | bwd_inner_microstep: 5005.71 | bwd_allreduce_microstep: 79.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3771 [2024-07-29 22:30:52,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-29 22:30:52,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.76 | bwd_microstep: 5024.98 | bwd_inner_microstep: 5005.55 | bwd_allreduce_microstep: 19.37 | step_microstep: 181.24 [2024-07-29 22:30:52,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28088.40 | bwd: 40597.14 | bwd_inner: 39833.11 | bwd_allreduce: 763.56 | step: 181.81 83%|████████▎ | 554/671 [10:47:38<2:15:29, 69.49s/it] {'loss': 1.1549, 'learning_rate': 1.5567207449798517e-06, 'epoch': 0.82} 83%|████████▎ | 554/671 [10:47:38<2:15:29, 69.49s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3593 [2024-07-29 22:31:01,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.01 | bwd_microstep: 5228.42 | bwd_inner_microstep: 5128.58 | bwd_allreduce_microstep: 99.77 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3587 [2024-07-29 22:31:10,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.37 | bwd_microstep: 5226.00 | bwd_inner_microstep: 5119.95 | bwd_allreduce_microstep: 105.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3815 [2024-07-29 22:31:18,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.47 | bwd_microstep: 5079.82 | bwd_inner_microstep: 5039.65 | bwd_allreduce_microstep: 40.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2218 [2024-07-29 22:31:26,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3010.33 | bwd_microstep: 4869.74 | bwd_inner_microstep: 4496.33 | bwd_allreduce_microstep: 373.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-29 22:31:35,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.85 | bwd_microstep: 4998.72 | bwd_inner_microstep: 4979.28 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-29 22:31:44,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.17 | bwd_microstep: 4889.02 | bwd_inner_microstep: 4869.64 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-29 22:31:52,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3244.44 | bwd_microstep: 4824.41 | bwd_inner_microstep: 4805.03 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3702 [2024-07-29 22:32:00,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-29 22:32:00,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3211.04 | bwd_microstep: 4802.60 | bwd_inner_microstep: 4769.26 | bwd_allreduce_microstep: 33.27 | step_microstep: 180.77 [2024-07-29 22:32:00,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27721.57 | bwd: 39918.71 | bwd_inner: 39207.67 | bwd_allreduce: 710.56 | step: 181.34 83%|████████▎ | 555/671 [10:48:46<2:13:27, 69.03s/it] {'loss': 1.1368, 'learning_rate': 1.5309217781971419e-06, 'epoch': 0.83} 83%|████████▎ | 555/671 [10:48:46<2:13:27, 69.03s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2411 [2024-07-29 22:32:08,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3100.21 | bwd_microstep: 5167.16 | bwd_inner_microstep: 4774.44 | bwd_allreduce_microstep: 392.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3593 [2024-07-29 22:32:17,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.50 | bwd_microstep: 5159.14 | bwd_inner_microstep: 5055.93 | bwd_allreduce_microstep: 103.14 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3800 [2024-07-29 22:32:26,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.33 | bwd_microstep: 5028.09 | bwd_inner_microstep: 5008.72 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-29 22:32:34,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.31 | bwd_microstep: 5063.45 | bwd_inner_microstep: 4996.99 | bwd_allreduce_microstep: 66.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-29 22:32:43,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.20 | bwd_microstep: 5196.12 | bwd_inner_microstep: 5139.83 | bwd_allreduce_microstep: 56.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-07-29 22:32:52,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.84 | bwd_microstep: 5001.19 | bwd_inner_microstep: 4981.88 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-29 22:33:01,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.76 | bwd_microstep: 5053.00 | bwd_inner_microstep: 5027.09 | bwd_allreduce_microstep: 25.85 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3700 [2024-07-29 22:33:10,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.35 [2024-07-29 22:33:10,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.05 | bwd_microstep: 5178.20 | bwd_inner_microstep: 5082.79 | bwd_allreduce_microstep: 95.34 | step_microstep: 181.18 [2024-07-29 22:33:10,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28671.10 | bwd: 40846.33 | bwd_inner: 40067.62 | bwd_allreduce: 778.25 | step: 181.76 83%|████████▎ | 556/671 [10:49:56<2:12:46, 69.28s/it] {'loss': 1.1504, 'learning_rate': 1.5053206487847893e-06, 'epoch': 0.83} 83%|████████▎ | 556/671 [10:49:56<2:12:46, 69.28s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2028 [2024-07-29 22:33:19,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.06 | bwd_microstep: 5275.44 | bwd_inner_microstep: 4868.41 | bwd_allreduce_microstep: 406.96 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3588 [2024-07-29 22:33:28,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3611.60 | bwd_microstep: 5231.67 | bwd_inner_microstep: 5121.28 | bwd_allreduce_microstep: 110.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-07-29 22:33:36,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.71 | bwd_microstep: 5202.04 | bwd_inner_microstep: 4798.19 | bwd_allreduce_microstep: 403.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-29 22:33:45,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.12 | bwd_microstep: 5107.36 | bwd_inner_microstep: 5037.08 | bwd_allreduce_microstep: 70.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-29 22:33:54,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.84 | bwd_microstep: 5182.78 | bwd_inner_microstep: 5128.60 | bwd_allreduce_microstep: 54.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-29 22:34:02,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3480.25 | bwd_microstep: 5126.87 | bwd_inner_microstep: 4730.08 | bwd_allreduce_microstep: 396.73 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-29 22:34:11,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.55 | bwd_microstep: 5075.00 | bwd_inner_microstep: 5013.84 | bwd_allreduce_microstep: 61.10 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3679 [2024-07-29 22:34:20,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 22:34:20,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.12 | bwd_microstep: 4973.04 | bwd_inner_microstep: 4913.35 | bwd_allreduce_microstep: 59.63 | step_microstep: 181.07 [2024-07-29 22:34:20,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28509.14 | bwd: 41174.18 | bwd_inner: 39610.77 | bwd_allreduce: 1562.95 | step: 181.65 83%|████████▎ | 557/671 [10:51:06<2:12:02, 69.49s/it] {'loss': 1.1365, 'learning_rate': 1.4799179547842823e-06, 'epoch': 0.83} 83%|████████▎ | 557/671 [10:51:06<2:12:02, 69.49s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3971 [2024-07-29 22:34:29,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3851.05 | bwd_microstep: 5247.58 | bwd_inner_microstep: 5228.50 | bwd_allreduce_microstep: 19.02 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3795 [2024-07-29 22:34:38,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.50 | bwd_microstep: 5063.38 | bwd_inner_microstep: 5044.03 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3793 [2024-07-29 22:34:47,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.67 | bwd_microstep: 5037.45 | bwd_inner_microstep: 5018.08 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-29 22:34:55,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.00 | bwd_microstep: 5124.00 | bwd_inner_microstep: 5054.28 | bwd_allreduce_microstep: 69.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-07-29 22:35:04,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.44 | bwd_microstep: 5217.87 | bwd_inner_microstep: 4810.36 | bwd_allreduce_microstep: 407.44 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2186 [2024-07-29 22:35:13,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.56 | bwd_microstep: 5245.63 | bwd_inner_microstep: 4838.56 | bwd_allreduce_microstep: 407.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-29 22:35:22,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.20 | bwd_microstep: 5062.40 | bwd_inner_microstep: 5002.78 | bwd_allreduce_microstep: 59.55 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-29 22:35:30,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 22:35:30,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.51 | bwd_microstep: 5004.93 | bwd_inner_microstep: 4950.39 | bwd_allreduce_microstep: 54.47 | step_microstep: 181.02 [2024-07-29 22:35:30,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29182.83 | bwd: 41003.21 | bwd_inner: 39946.92 | bwd_allreduce: 1055.82 | step: 181.70 83%|████████▎ | 558/671 [10:52:16<2:11:27, 69.80s/it] {'loss': 1.1375, 'learning_rate': 1.4547142896016586e-06, 'epoch': 0.83} 83%|████████▎ | 558/671 [10:52:16<2:11:27, 69.80s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2440 [2024-07-29 22:35:39,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3082.68 | bwd_microstep: 5117.19 | bwd_inner_microstep: 4727.26 | bwd_allreduce_microstep: 389.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3819 [2024-07-29 22:35:47,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.02 | bwd_microstep: 5226.85 | bwd_inner_microstep: 5175.24 | bwd_allreduce_microstep: 51.55 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2284 [2024-07-29 22:35:56,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.86 | bwd_microstep: 5303.12 | bwd_inner_microstep: 4891.95 | bwd_allreduce_microstep: 411.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3748 [2024-07-29 22:36:04,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3243.25 | bwd_microstep: 4816.78 | bwd_inner_microstep: 4796.76 | bwd_allreduce_microstep: 19.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-29 22:36:13,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.73 | bwd_microstep: 4984.04 | bwd_inner_microstep: 4964.59 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-29 22:36:22,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.09 | bwd_microstep: 5163.78 | bwd_inner_microstep: 5086.46 | bwd_allreduce_microstep: 77.24 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3680 [2024-07-29 22:36:30,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.43 | bwd_microstep: 4835.58 | bwd_inner_microstep: 4794.61 | bwd_allreduce_microstep: 40.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2188 [2024-07-29 22:36:39,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 22:36:39,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.61 | bwd_microstep: 5214.54 | bwd_inner_microstep: 4811.88 | bwd_allreduce_microstep: 402.59 | step_microstep: 182.56 [2024-07-29 22:36:39,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27681.55 | bwd: 40661.87 | bwd_inner: 39248.70 | bwd_allreduce: 1412.69 | step: 183.16 83%|████████▎ | 559/671 [10:53:25<2:09:39, 69.46s/it] {'loss': 1.1443, 'learning_rate': 1.4297102419936582e-06, 'epoch': 0.83} 83%|████████▎ | 559/671 [10:53:25<2:09:39, 69.46s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3916 [2024-07-29 22:36:47,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3257.55 | bwd_microstep: 4995.67 | bwd_inner_microstep: 4970.16 | bwd_allreduce_microstep: 25.44 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2284 [2024-07-29 22:36:56,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.79 | bwd_microstep: 5267.12 | bwd_inner_microstep: 4858.05 | bwd_allreduce_microstep: 409.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3780 [2024-07-29 22:37:05,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.74 | bwd_microstep: 5211.77 | bwd_inner_microstep: 5155.48 | bwd_allreduce_microstep: 56.23 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3624 [2024-07-29 22:37:13,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3131.01 | bwd_microstep: 5044.53 | bwd_inner_microstep: 4967.24 | bwd_allreduce_microstep: 77.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-29 22:37:22,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.38 | bwd_microstep: 5189.60 | bwd_inner_microstep: 5109.89 | bwd_allreduce_microstep: 79.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3748 [2024-07-29 22:37:31,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.00 | bwd_microstep: 5159.40 | bwd_inner_microstep: 5106.09 | bwd_allreduce_microstep: 53.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2280 [2024-07-29 22:37:39,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.99 | bwd_microstep: 5042.94 | bwd_inner_microstep: 4651.04 | bwd_allreduce_microstep: 391.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-29 22:37:48,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 22:37:48,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.35 | bwd_microstep: 5335.87 | bwd_inner_microstep: 5099.24 | bwd_allreduce_microstep: 236.56 | step_microstep: 180.26 [2024-07-29 22:37:48,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27823.72 | bwd: 41246.89 | bwd_inner: 39917.13 | bwd_allreduce: 1329.29 | step: 180.85 83%|████████▎ | 560/671 [10:54:34<2:08:28, 69.44s/it] {'loss': 1.1405, 'learning_rate': 1.4049063960539488e-06, 'epoch': 0.83} 83%|████████▎ | 560/671 [10:54:34<2:08:28, 69.44s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4014 [2024-07-29 22:37:57,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.31 | bwd_microstep: 5171.56 | bwd_inner_microstep: 5141.79 | bwd_allreduce_microstep: 29.70 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-29 22:38:06,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.02 | bwd_microstep: 5078.97 | bwd_inner_microstep: 5049.47 | bwd_allreduce_microstep: 29.43 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3621 [2024-07-29 22:38:15,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.94 | bwd_microstep: 5182.25 | bwd_inner_microstep: 5104.45 | bwd_allreduce_microstep: 77.73 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3636 [2024-07-29 22:38:24,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.33 | bwd_microstep: 5116.50 | bwd_inner_microstep: 5044.98 | bwd_allreduce_microstep: 71.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2176 [2024-07-29 22:38:32,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3016.85 | bwd_microstep: 4875.81 | bwd_inner_microstep: 4500.37 | bwd_allreduce_microstep: 375.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 22:38:40,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.47 | bwd_microstep: 4999.12 | bwd_inner_microstep: 4979.62 | bwd_allreduce_microstep: 19.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-29 22:38:49,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.10 | bwd_microstep: 4884.01 | bwd_inner_microstep: 4864.63 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-29 22:38:58,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 22:38:58,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.64 | bwd_microstep: 4910.67 | bwd_inner_microstep: 4891.29 | bwd_allreduce_microstep: 19.31 | step_microstep: 180.87 [2024-07-29 22:38:58,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28817.56 | bwd: 40218.88 | bwd_inner: 39576.55 | bwd_allreduce: 641.86 | step: 181.44 84%|████████▎ | 561/671 [10:55:44<2:07:16, 69.42s/it] {'loss': 1.1526, 'learning_rate': 1.3803033311995096e-06, 'epoch': 0.83} 84%|████████▎ | 561/671 [10:55:44<2:07:16, 69.42s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2372 [2024-07-29 22:39:07,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.32 | bwd_microstep: 5459.59 | bwd_inner_microstep: 5042.21 | bwd_allreduce_microstep: 417.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3786 [2024-07-29 22:39:16,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.49 | bwd_microstep: 5196.57 | bwd_inner_microstep: 5139.23 | bwd_allreduce_microstep: 57.29 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2241 [2024-07-29 22:39:24,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3052.28 | bwd_microstep: 5018.98 | bwd_inner_microstep: 4631.76 | bwd_allreduce_microstep: 387.16 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2270 [2024-07-29 22:39:32,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3004.03 | bwd_microstep: 4882.07 | bwd_inner_microstep: 4502.61 | bwd_allreduce_microstep: 379.39 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3646 [2024-07-29 22:39:40,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.35 | bwd_microstep: 4966.91 | bwd_inner_microstep: 4903.46 | bwd_allreduce_microstep: 63.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-29 22:39:49,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.34 | bwd_microstep: 5159.32 | bwd_inner_microstep: 5084.30 | bwd_allreduce_microstep: 74.96 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 627 [2024-07-29 22:39:58,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3474.63 | bwd_microstep: 5179.80 | bwd_inner_microstep: 4781.24 | bwd_allreduce_microstep: 398.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-29 22:40:07,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-29 22:40:07,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.51 | bwd_microstep: 4975.79 | bwd_inner_microstep: 4956.47 | bwd_allreduce_microstep: 19.26 | step_microstep: 180.97 [2024-07-29 22:40:07,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27638.85 | bwd: 40839.02 | bwd_inner: 39041.20 | bwd_allreduce: 1797.35 | step: 181.53 84%|████████▍ | 562/671 [10:56:53<2:05:46, 69.23s/it] {'loss': 1.0707, 'learning_rate': 1.3559016221570663e-06, 'epoch': 0.84} 84%|████████▍ | 562/671 [10:56:53<2:05:46, 69.23s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-29 22:40:15,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3645.82 | bwd_microstep: 5254.19 | bwd_inner_microstep: 5175.53 | bwd_allreduce_microstep: 78.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3578 [2024-07-29 22:40:24,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.15 | bwd_microstep: 5161.93 | bwd_inner_microstep: 5077.46 | bwd_allreduce_microstep: 84.40 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3796 [2024-07-29 22:40:33,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.84 | bwd_microstep: 5124.32 | bwd_inner_microstep: 5057.67 | bwd_allreduce_microstep: 66.58 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2239 [2024-07-29 22:40:42,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.04 | bwd_microstep: 5240.58 | bwd_inner_microstep: 4832.54 | bwd_allreduce_microstep: 407.98 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3621 [2024-07-29 22:40:51,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.54 | bwd_microstep: 5109.41 | bwd_inner_microstep: 5022.68 | bwd_allreduce_microstep: 86.65 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2122 [2024-07-29 22:40:59,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.62 | bwd_microstep: 5189.12 | bwd_inner_microstep: 4784.02 | bwd_allreduce_microstep: 405.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3736 [2024-07-29 22:41:08,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.06 | bwd_microstep: 4908.94 | bwd_inner_microstep: 4882.67 | bwd_allreduce_microstep: 26.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-29 22:41:16,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 22:41:16,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3193.89 | bwd_microstep: 4715.30 | bwd_inner_microstep: 4692.97 | bwd_allreduce_microstep: 22.26 | step_microstep: 181.08 [2024-07-29 22:41:16,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28261.86 | bwd: 40703.77 | bwd_inner: 39525.47 | bwd_allreduce: 1177.83 | step: 181.66 84%|████████▍ | 563/671 [10:58:02<2:04:39, 69.25s/it] {'loss': 1.1649, 'learning_rate': 1.3317018389496927e-06, 'epoch': 0.84} 84%|████████▍ | 563/671 [10:58:02<2:04:39, 69.25s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3575 [2024-07-29 22:41:25,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3708.76 | bwd_microstep: 5712.75 | bwd_inner_microstep: 5516.82 | bwd_allreduce_microstep: 195.86 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2275 [2024-07-29 22:41:34,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.19 | bwd_microstep: 5262.77 | bwd_inner_microstep: 4851.87 | bwd_allreduce_microstep: 410.83 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3736 [2024-07-29 22:41:43,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.06 | bwd_microstep: 5182.47 | bwd_inner_microstep: 5104.84 | bwd_allreduce_microstep: 77.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-29 22:41:52,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.41 | bwd_microstep: 5011.76 | bwd_inner_microstep: 4992.37 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3654 [2024-07-29 22:42:01,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.89 | bwd_microstep: 5135.62 | bwd_inner_microstep: 5064.51 | bwd_allreduce_microstep: 71.04 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-29 22:42:09,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.92 | bwd_microstep: 5179.38 | bwd_inner_microstep: 4775.40 | bwd_allreduce_microstep: 403.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-29 22:42:17,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3055.56 | bwd_microstep: 5005.75 | bwd_inner_microstep: 4617.70 | bwd_allreduce_microstep: 387.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-29 22:42:26,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 22:42:26,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.28 | bwd_microstep: 5088.16 | bwd_inner_microstep: 5043.77 | bwd_allreduce_microstep: 44.33 | step_microstep: 181.33 [2024-07-29 22:42:26,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28477.97 | bwd: 41578.65 | bwd_inner: 39967.22 | bwd_allreduce: 1610.96 | step: 181.92 84%|████████▍ | 564/671 [10:59:12<2:04:06, 69.59s/it] {'loss': 1.1294, 'learning_rate': 1.3077045468834714e-06, 'epoch': 0.84} 84%|████████▍ | 564/671 [10:59:12<2:04:06, 69.59s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2414 [2024-07-29 22:42:35,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.74 | bwd_microstep: 5382.23 | bwd_inner_microstep: 4968.81 | bwd_allreduce_microstep: 413.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3771 [2024-07-29 22:42:44,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.14 | bwd_microstep: 4998.25 | bwd_inner_microstep: 4978.81 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-29 22:42:53,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.08 | bwd_microstep: 5148.40 | bwd_inner_microstep: 5071.86 | bwd_allreduce_microstep: 76.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-29 22:43:01,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3703.84 | bwd_microstep: 4890.12 | bwd_inner_microstep: 4870.77 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1134 [2024-07-29 22:43:10,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.53 | bwd_microstep: 5266.07 | bwd_inner_microstep: 4861.37 | bwd_allreduce_microstep: 404.63 | step_microstep: 0.08 dynamic ViT batch size: 2, images per sample: 1.0, dynamic token length: 663 [2024-07-29 22:43:19,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3413.48 | bwd_microstep: 5051.87 | bwd_inner_microstep: 4662.38 | bwd_allreduce_microstep: 389.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2141 [2024-07-29 22:43:27,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3480.83 | bwd_microstep: 5065.34 | bwd_inner_microstep: 4672.66 | bwd_allreduce_microstep: 392.61 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2138 [2024-07-29 22:43:36,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 22:43:36,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.41 | bwd_microstep: 5077.54 | bwd_inner_microstep: 4683.11 | bwd_allreduce_microstep: 394.37 | step_microstep: 180.76 [2024-07-29 22:43:36,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28577.96 | bwd: 40879.80 | bwd_inner: 38769.71 | bwd_allreduce: 2109.62 | step: 181.33 84%|████████▍ | 565/671 [11:00:22<2:03:02, 69.65s/it] {'loss': 1.0976, 'learning_rate': 1.2839103065343084e-06, 'epoch': 0.84} 84%|████████▍ | 565/671 [11:00:22<2:03:02, 69.65s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2355 [2024-07-29 22:43:45,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.08 | bwd_microstep: 5336.03 | bwd_inner_microstep: 4925.16 | bwd_allreduce_microstep: 410.80 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3797 [2024-07-29 22:43:53,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3252.28 | bwd_microstep: 4826.46 | bwd_inner_microstep: 4807.02 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3640 [2024-07-29 22:44:02,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.83 | bwd_microstep: 5196.47 | bwd_inner_microstep: 5107.41 | bwd_allreduce_microstep: 88.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-29 22:44:11,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.54 | bwd_microstep: 5122.43 | bwd_inner_microstep: 5048.32 | bwd_allreduce_microstep: 74.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-29 22:44:19,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.41 | bwd_microstep: 5105.26 | bwd_inner_microstep: 5056.72 | bwd_allreduce_microstep: 48.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-29 22:44:28,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.37 | bwd_microstep: 5136.14 | bwd_inner_microstep: 5082.92 | bwd_allreduce_microstep: 53.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-29 22:44:37,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.37 | bwd_microstep: 5122.18 | bwd_inner_microstep: 4725.19 | bwd_allreduce_microstep: 396.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-29 22:44:46,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 22:44:46,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.04 | bwd_microstep: 5116.10 | bwd_inner_microstep: 4719.20 | bwd_allreduce_microstep: 396.83 | step_microstep: 180.85 [2024-07-29 22:44:46,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28279.83 | bwd: 40961.05 | bwd_inner: 39471.90 | bwd_allreduce: 1488.68 | step: 181.46 84%|████████▍ | 566/671 [11:01:32<2:01:50, 69.62s/it] {'loss': 1.1443, 'learning_rate': 1.2603196737348211e-06, 'epoch': 0.84} 84%|████████▍ | 566/671 [11:01:32<2:01:50, 69.62s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2346 [2024-07-29 22:44:54,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.03 | bwd_microstep: 5229.27 | bwd_inner_microstep: 4825.51 | bwd_allreduce_microstep: 403.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3922 [2024-07-29 22:45:03,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3308.20 | bwd_microstep: 4954.66 | bwd_inner_microstep: 4935.32 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-29 22:45:11,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.03 | bwd_microstep: 5177.66 | bwd_inner_microstep: 5093.13 | bwd_allreduce_microstep: 84.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-29 22:45:20,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.57 | bwd_microstep: 4998.48 | bwd_inner_microstep: 4979.06 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-29 22:45:29,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.55 | bwd_microstep: 5014.18 | bwd_inner_microstep: 4979.11 | bwd_allreduce_microstep: 35.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-29 22:45:38,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.29 | bwd_microstep: 5177.07 | bwd_inner_microstep: 5103.11 | bwd_allreduce_microstep: 73.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-29 22:45:46,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3682.14 | bwd_microstep: 4888.91 | bwd_inner_microstep: 4869.48 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2175 [2024-07-29 22:45:55,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.43 [2024-07-29 22:45:55,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.51 | bwd_microstep: 5120.43 | bwd_inner_microstep: 4723.51 | bwd_allreduce_microstep: 396.86 | step_microstep: 180.66 [2024-07-29 22:45:55,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28689.24 | bwd: 40560.63 | bwd_inner: 39508.18 | bwd_allreduce: 1051.98 | step: 181.23 85%|████████▍ | 567/671 [11:02:41<2:00:39, 69.61s/it] {'loss': 1.1315, 'learning_rate': 1.2369331995613643e-06, 'epoch': 0.84} 85%|████████▍ | 567/671 [11:02:41<2:00:39, 69.61s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2408 [2024-07-29 22:46:04,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3130.52 | bwd_microstep: 5192.28 | bwd_inner_microstep: 4794.61 | bwd_allreduce_microstep: 397.60 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3827 [2024-07-29 22:46:12,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3640.43 | bwd_microstep: 5243.88 | bwd_inner_microstep: 5189.74 | bwd_allreduce_microstep: 54.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3822 [2024-07-29 22:46:21,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3273.62 | bwd_microstep: 4861.11 | bwd_inner_microstep: 4837.24 | bwd_allreduce_microstep: 23.81 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3781 [2024-07-29 22:46:29,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.23 | bwd_microstep: 5033.76 | bwd_inner_microstep: 5014.47 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-29 22:46:38,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.84 | bwd_microstep: 5211.56 | bwd_inner_microstep: 4806.52 | bwd_allreduce_microstep: 404.97 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-29 22:46:47,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3526.05 | bwd_microstep: 5120.70 | bwd_inner_microstep: 4724.65 | bwd_allreduce_microstep: 395.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3696 [2024-07-29 22:46:55,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.32 | bwd_microstep: 4878.77 | bwd_inner_microstep: 4859.33 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-07-29 22:47:04,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 22:47:04,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3525.16 | bwd_microstep: 5138.92 | bwd_inner_microstep: 4741.19 | bwd_allreduce_microstep: 397.66 | step_microstep: 180.78 [2024-07-29 22:47:04,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28059.07 | bwd: 40680.95 | bwd_inner: 38967.68 | bwd_allreduce: 1712.80 | step: 181.34 85%|████████▍ | 568/671 [11:03:50<1:59:13, 69.45s/it] {'loss': 1.1398, 'learning_rate': 1.213751430321156e-06, 'epoch': 0.85} 85%|████████▍ | 568/671 [11:03:50<1:59:13, 69.45s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3924 [2024-07-29 22:47:13,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3784.69 | bwd_microstep: 5158.88 | bwd_inner_microstep: 5139.76 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3597 [2024-07-29 22:47:22,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.77 | bwd_microstep: 5177.84 | bwd_inner_microstep: 5081.57 | bwd_allreduce_microstep: 96.20 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3612 [2024-07-29 22:47:31,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.35 | bwd_microstep: 5217.16 | bwd_inner_microstep: 5118.36 | bwd_allreduce_microstep: 98.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2228 [2024-07-29 22:47:40,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.97 | bwd_microstep: 5174.50 | bwd_inner_microstep: 4773.42 | bwd_allreduce_microstep: 401.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-29 22:47:48,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.90 | bwd_microstep: 4986.11 | bwd_inner_microstep: 4966.66 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3678 [2024-07-29 22:47:57,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.37 | bwd_microstep: 5086.14 | bwd_inner_microstep: 5013.49 | bwd_allreduce_microstep: 72.58 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3666 [2024-07-29 22:48:06,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.72 | bwd_microstep: 5099.26 | bwd_inner_microstep: 5016.75 | bwd_allreduce_microstep: 82.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 22:48:14,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 22:48:14,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.57 | bwd_microstep: 4915.48 | bwd_inner_microstep: 4874.56 | bwd_allreduce_microstep: 40.85 | step_microstep: 190.74 [2024-07-29 22:48:14,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28975.25 | bwd: 40815.35 | bwd_inner: 39984.52 | bwd_allreduce: 830.36 | step: 191.32 85%|████████▍ | 569/671 [11:05:00<1:58:24, 69.65s/it] {'loss': 1.1239, 'learning_rate': 1.1907749075395126e-06, 'epoch': 0.85} 85%|████████▍ | 569/671 [11:05:00<1:58:24, 69.65s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2390 [2024-07-29 22:48:23,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.67 | bwd_microstep: 5212.38 | bwd_inner_microstep: 4809.70 | bwd_allreduce_microstep: 402.61 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 1547 [2024-07-29 22:48:31,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3056.67 | bwd_microstep: 5104.55 | bwd_inner_microstep: 4713.04 | bwd_allreduce_microstep: 391.44 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3734 [2024-07-29 22:48:40,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.54 | bwd_microstep: 5014.76 | bwd_inner_microstep: 4987.95 | bwd_allreduce_microstep: 26.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3771 [2024-07-29 22:48:49,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.03 | bwd_microstep: 5061.83 | bwd_inner_microstep: 5036.92 | bwd_allreduce_microstep: 24.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3681 [2024-07-29 22:48:57,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3188.78 | bwd_microstep: 4730.81 | bwd_inner_microstep: 4705.39 | bwd_allreduce_microstep: 25.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-29 22:49:05,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.68 | bwd_microstep: 5002.18 | bwd_inner_microstep: 4950.37 | bwd_allreduce_microstep: 51.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3713 [2024-07-29 22:49:14,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.44 | bwd_microstep: 4983.89 | bwd_inner_microstep: 4964.54 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3664 [2024-07-29 22:49:23,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 22:49:23,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.55 | bwd_microstep: 5016.74 | bwd_inner_microstep: 4954.23 | bwd_allreduce_microstep: 62.45 | step_microstep: 181.07 [2024-07-29 22:49:23,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28123.27 | bwd: 40127.12 | bwd_inner: 39122.08 | bwd_allreduce: 1004.57 | step: 181.65 85%|████████▍ | 570/671 [11:06:09<1:56:42, 69.33s/it] {'loss': 1.1014, 'learning_rate': 1.168004167947202e-06, 'epoch': 0.85} 85%|████████▍ | 570/671 [11:06:09<1:56:42, 69.33s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3881 [2024-07-29 22:49:32,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3663.94 | bwd_microstep: 5271.45 | bwd_inner_microstep: 5211.65 | bwd_allreduce_microstep: 59.74 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3730 [2024-07-29 22:49:41,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.68 | bwd_microstep: 5040.82 | bwd_inner_microstep: 5014.23 | bwd_allreduce_microstep: 26.52 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3601 [2024-07-29 22:49:49,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3064.55 | bwd_microstep: 4862.16 | bwd_inner_microstep: 4823.04 | bwd_allreduce_microstep: 39.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3775 [2024-07-29 22:49:57,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3775.74 | bwd_microstep: 5045.50 | bwd_inner_microstep: 5022.70 | bwd_allreduce_microstep: 22.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-29 22:50:06,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.57 | bwd_microstep: 5164.35 | bwd_inner_microstep: 5089.74 | bwd_allreduce_microstep: 74.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3739 [2024-07-29 22:50:15,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.36 | bwd_microstep: 5055.28 | bwd_inner_microstep: 5016.40 | bwd_allreduce_microstep: 38.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-29 22:50:24,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.60 | bwd_microstep: 5046.25 | bwd_inner_microstep: 5005.98 | bwd_allreduce_microstep: 40.20 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-29 22:50:32,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 22:50:32,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3459.41 | bwd_microstep: 5031.26 | bwd_inner_microstep: 4639.88 | bwd_allreduce_microstep: 391.31 | step_microstep: 181.34 [2024-07-29 22:50:32,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28521.76 | bwd: 40517.05 | bwd_inner: 39823.56 | bwd_allreduce: 693.01 | step: 181.90 85%|████████▌ | 571/671 [11:07:18<1:55:34, 69.34s/it] {'loss': 1.1451, 'learning_rate': 1.1454397434679022e-06, 'epoch': 0.85} 85%|████████▌ | 571/671 [11:07:18<1:55:34, 69.34s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2442 [2024-07-29 22:50:41,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3125.37 | bwd_microstep: 5169.76 | bwd_inner_microstep: 4775.59 | bwd_allreduce_microstep: 394.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3788 [2024-07-29 22:50:49,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.19 | bwd_microstep: 5123.18 | bwd_inner_microstep: 5079.70 | bwd_allreduce_microstep: 43.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-29 22:50:58,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.67 | bwd_microstep: 5254.47 | bwd_inner_microstep: 4845.73 | bwd_allreduce_microstep: 408.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-29 22:51:07,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3761.13 | bwd_microstep: 5053.01 | bwd_inner_microstep: 5026.82 | bwd_allreduce_microstep: 26.13 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-29 22:51:16,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.98 | bwd_microstep: 5040.79 | bwd_inner_microstep: 5018.25 | bwd_allreduce_microstep: 22.48 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3699 [2024-07-29 22:51:25,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.81 | bwd_microstep: 5113.87 | bwd_inner_microstep: 5035.95 | bwd_allreduce_microstep: 77.86 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3670 [2024-07-29 22:51:33,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3084.94 | bwd_microstep: 4866.37 | bwd_inner_microstep: 4827.09 | bwd_allreduce_microstep: 39.21 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 22:51:41,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-29 22:51:41,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.06 | bwd_microstep: 5084.87 | bwd_inner_microstep: 5022.56 | bwd_allreduce_microstep: 62.25 | step_microstep: 180.58 [2024-07-29 22:51:41,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28043.04 | bwd: 40706.32 | bwd_inner: 39631.62 | bwd_allreduce: 1074.23 | step: 181.16 85%|████████▌ | 572/671 [11:08:27<1:54:17, 69.26s/it] {'loss': 1.1545, 'learning_rate': 1.1230821612057764e-06, 'epoch': 0.85} 85%|████████▌ | 572/671 [11:08:27<1:54:17, 69.26s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3839 [2024-07-29 22:51:50,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.21 | bwd_microstep: 5299.62 | bwd_inner_microstep: 5234.61 | bwd_allreduce_microstep: 64.94 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3828 [2024-07-29 22:51:59,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.11 | bwd_microstep: 5181.71 | bwd_inner_microstep: 5130.67 | bwd_allreduce_microstep: 50.97 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-29 22:52:08,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3785.75 | bwd_microstep: 5035.85 | bwd_inner_microstep: 5010.05 | bwd_allreduce_microstep: 25.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-29 22:52:17,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3632.12 | bwd_microstep: 5198.58 | bwd_inner_microstep: 5115.04 | bwd_allreduce_microstep: 83.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-07-29 22:52:25,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.37 | bwd_microstep: 4811.63 | bwd_inner_microstep: 4792.20 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3655 [2024-07-29 22:52:34,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.44 | bwd_microstep: 5054.64 | bwd_inner_microstep: 4980.80 | bwd_allreduce_microstep: 73.77 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-29 22:52:42,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.42 | bwd_microstep: 5134.50 | bwd_inner_microstep: 5082.91 | bwd_allreduce_microstep: 51.52 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3697 [2024-07-29 22:52:51,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 22:52:51,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.66 | bwd_microstep: 5041.66 | bwd_inner_microstep: 4989.40 | bwd_allreduce_microstep: 52.19 | step_microstep: 181.85 [2024-07-29 22:52:51,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28714.99 | bwd: 40758.15 | bwd_inner: 40335.60 | bwd_allreduce: 422.04 | step: 182.54 85%|████████▌ | 573/671 [11:09:37<1:53:23, 69.43s/it] {'loss': 1.0954, 'learning_rate': 1.1009319434331623e-06, 'epoch': 0.85} 85%|████████▌ | 573/671 [11:09:37<1:53:23, 69.43s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3907 [2024-07-29 22:53:00,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3680.01 | bwd_microstep: 5274.57 | bwd_inner_microstep: 5222.25 | bwd_allreduce_microstep: 52.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-29 22:53:08,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3178.32 | bwd_microstep: 4670.99 | bwd_inner_microstep: 4647.52 | bwd_allreduce_microstep: 23.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-29 22:53:17,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.77 | bwd_microstep: 5184.89 | bwd_inner_microstep: 5132.36 | bwd_allreduce_microstep: 52.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3640 [2024-07-29 22:53:26,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.78 | bwd_microstep: 5196.79 | bwd_inner_microstep: 5090.72 | bwd_allreduce_microstep: 105.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-29 22:53:35,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.13 | bwd_microstep: 5037.33 | bwd_inner_microstep: 5011.45 | bwd_allreduce_microstep: 25.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3661 [2024-07-29 22:53:42,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3200.35 | bwd_microstep: 4708.18 | bwd_inner_microstep: 4683.48 | bwd_allreduce_microstep: 24.64 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2127 [2024-07-29 22:53:51,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.40 | bwd_microstep: 5152.06 | bwd_inner_microstep: 4752.76 | bwd_allreduce_microstep: 399.24 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3670 [2024-07-29 22:54:00,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 22:54:00,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.69 | bwd_microstep: 5094.84 | bwd_inner_microstep: 5010.18 | bwd_allreduce_microstep: 84.59 | step_microstep: 182.43 [2024-07-29 22:54:00,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28163.37 | bwd: 40319.64 | bwd_inner: 39550.66 | bwd_allreduce: 768.51 | step: 183.00 86%|████████▌ | 574/671 [11:10:46<1:51:56, 69.24s/it] {'loss': 1.1322, 'learning_rate': 1.0789896075783734e-06, 'epoch': 0.85} 86%|████████▌ | 574/671 [11:10:46<1:51:56, 69.24s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3916 [2024-07-29 22:54:09,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.61 | bwd_microstep: 5284.03 | bwd_inner_microstep: 5229.52 | bwd_allreduce_microstep: 54.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3810 [2024-07-29 22:54:18,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.91 | bwd_microstep: 5166.57 | bwd_inner_microstep: 5122.39 | bwd_allreduce_microstep: 44.12 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2241 [2024-07-29 22:54:27,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.41 | bwd_microstep: 5227.61 | bwd_inner_microstep: 4820.87 | bwd_allreduce_microstep: 406.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-29 22:54:35,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3537.30 | bwd_microstep: 5197.02 | bwd_inner_microstep: 4790.27 | bwd_allreduce_microstep: 406.69 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3760 [2024-07-29 22:54:44,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.98 | bwd_microstep: 5002.51 | bwd_inner_microstep: 4983.19 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 1694 [2024-07-29 22:54:53,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.08 | bwd_microstep: 5147.53 | bwd_inner_microstep: 4750.12 | bwd_allreduce_microstep: 397.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-29 22:55:02,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.52 | bwd_microstep: 5127.38 | bwd_inner_microstep: 5057.21 | bwd_allreduce_microstep: 70.11 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-29 22:55:10,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 22:55:10,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3516.79 | bwd_microstep: 5096.78 | bwd_inner_microstep: 4702.22 | bwd_allreduce_microstep: 394.50 | step_microstep: 187.15 [2024-07-29 22:55:10,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28723.50 | bwd: 41249.41 | bwd_inner: 39455.73 | bwd_allreduce: 1793.21 | step: 187.71 86%|████████▌ | 575/671 [11:11:56<1:51:17, 69.56s/it] {'loss': 1.1098, 'learning_rate': 1.0572556662136036e-06, 'epoch': 0.86} 86%|████████▌ | 575/671 [11:11:56<1:51:17, 69.56s/it]dynamic ViT batch size: 5, images per sample: 2.5, dynamic token length: 1545 [2024-07-29 22:55:19,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3524.18 | bwd_microstep: 5289.67 | bwd_inner_microstep: 4880.06 | bwd_allreduce_microstep: 409.54 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3813 [2024-07-29 22:55:28,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.44 | bwd_microstep: 5211.06 | bwd_inner_microstep: 5141.69 | bwd_allreduce_microstep: 69.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3742 [2024-07-29 22:55:37,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.42 | bwd_microstep: 5058.27 | bwd_inner_microstep: 5033.69 | bwd_allreduce_microstep: 24.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-29 22:55:46,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.54 | bwd_microstep: 5186.90 | bwd_inner_microstep: 5111.24 | bwd_allreduce_microstep: 75.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 22:55:54,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.93 | bwd_microstep: 4691.37 | bwd_inner_microstep: 4670.34 | bwd_allreduce_microstep: 20.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3711 [2024-07-29 22:56:02,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.91 | bwd_microstep: 5062.04 | bwd_inner_microstep: 5002.75 | bwd_allreduce_microstep: 59.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-29 22:56:11,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.51 | bwd_microstep: 5413.66 | bwd_inner_microstep: 5234.60 | bwd_allreduce_microstep: 179.00 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-29 22:56:19,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 22:56:19,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3197.36 | bwd_microstep: 4712.64 | bwd_inner_microstep: 4689.43 | bwd_allreduce_microstep: 23.14 | step_microstep: 180.83 [2024-07-29 22:56:19,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28114.20 | bwd: 40625.58 | bwd_inner: 39763.75 | bwd_allreduce: 861.37 | step: 181.40 86%|████████▌ | 576/671 [11:13:05<1:49:54, 69.41s/it] {'loss': 1.1495, 'learning_rate': 1.0357306270429623e-06, 'epoch': 0.86} 86%|████████▌ | 576/671 [11:13:05<1:49:54, 69.41s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2385 [2024-07-29 22:56:28,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.03 | bwd_microstep: 5425.94 | bwd_inner_microstep: 5008.25 | bwd_allreduce_microstep: 417.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3553 [2024-07-29 22:56:37,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.45 | bwd_microstep: 5170.05 | bwd_inner_microstep: 5083.68 | bwd_allreduce_microstep: 86.30 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2056 [2024-07-29 22:56:45,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3007.57 | bwd_microstep: 4895.69 | bwd_inner_microstep: 4518.15 | bwd_allreduce_microstep: 377.47 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2252 [2024-07-29 22:56:53,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3059.94 | bwd_microstep: 5016.73 | bwd_inner_microstep: 4629.68 | bwd_allreduce_microstep: 386.98 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2207 [2024-07-29 22:57:02,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.26 | bwd_microstep: 5188.07 | bwd_inner_microstep: 4782.99 | bwd_allreduce_microstep: 405.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-29 22:57:11,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.84 | bwd_microstep: 5164.23 | bwd_inner_microstep: 5084.12 | bwd_allreduce_microstep: 80.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-29 22:57:20,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.51 | bwd_microstep: 5019.66 | bwd_inner_microstep: 4994.91 | bwd_allreduce_microstep: 24.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-29 22:57:29,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-29 22:57:29,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.23 | bwd_microstep: 4976.16 | bwd_inner_microstep: 4956.81 | bwd_allreduce_microstep: 19.29 | step_microstep: 180.75 [2024-07-29 22:57:29,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27947.74 | bwd: 40856.51 | bwd_inner: 39058.53 | bwd_allreduce: 1797.51 | step: 181.32 86%|████████▌ | 577/671 [11:14:14<1:48:36, 69.33s/it] {'loss': 1.1342, 'learning_rate': 1.014414992890611e-06, 'epoch': 0.86} 86%|████████▌ | 577/671 [11:14:14<1:48:36, 69.33s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2334 [2024-07-29 22:57:37,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.71 | bwd_microstep: 5335.44 | bwd_inner_microstep: 4924.82 | bwd_allreduce_microstep: 410.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3558 [2024-07-29 22:57:46,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.35 | bwd_microstep: 5165.80 | bwd_inner_microstep: 5076.72 | bwd_allreduce_microstep: 89.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3596 [2024-07-29 22:57:55,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.96 | bwd_microstep: 5126.78 | bwd_inner_microstep: 5052.95 | bwd_allreduce_microstep: 73.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-29 22:58:03,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.13 | bwd_microstep: 4874.72 | bwd_inner_microstep: 4828.19 | bwd_allreduce_microstep: 46.46 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3649 [2024-07-29 22:58:12,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.35 | bwd_microstep: 5193.38 | bwd_inner_microstep: 5092.54 | bwd_allreduce_microstep: 100.77 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2199 [2024-07-29 22:58:21,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.48 | bwd_microstep: 5103.21 | bwd_inner_microstep: 4709.10 | bwd_allreduce_microstep: 394.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-29 22:58:29,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.34 | bwd_microstep: 5100.85 | bwd_inner_microstep: 5035.96 | bwd_allreduce_microstep: 64.83 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2183 [2024-07-29 22:58:38,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-29 22:58:38,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.46 | bwd_microstep: 5150.82 | bwd_inner_microstep: 4750.35 | bwd_allreduce_microstep: 400.40 | step_microstep: 181.13 [2024-07-29 22:58:38,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28216.69 | bwd: 41050.97 | bwd_inner: 39470.58 | bwd_allreduce: 1579.94 | step: 181.70 86%|████████▌ | 578/671 [11:15:24<1:47:34, 69.41s/it] {'loss': 1.1954, 'learning_rate': 9.933092616890127e-07, 'epoch': 0.86} 86%|████████▌ | 578/671 [11:15:24<1:47:34, 69.41s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3569 [2024-07-29 22:58:47,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.48 | bwd_microstep: 5248.96 | bwd_inner_microstep: 5155.19 | bwd_allreduce_microstep: 93.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3629 [2024-07-29 22:58:56,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.25 | bwd_microstep: 5200.47 | bwd_inner_microstep: 5123.86 | bwd_allreduce_microstep: 76.56 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-29 22:59:05,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.06 | bwd_microstep: 5105.15 | bwd_inner_microstep: 5034.53 | bwd_allreduce_microstep: 70.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3656 [2024-07-29 22:59:13,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.54 | bwd_microstep: 5239.53 | bwd_inner_microstep: 5158.77 | bwd_allreduce_microstep: 80.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3771 [2024-07-29 22:59:22,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.21 | bwd_microstep: 5106.36 | bwd_inner_microstep: 5062.21 | bwd_allreduce_microstep: 44.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3703 [2024-07-29 22:59:31,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.33 | bwd_microstep: 5112.76 | bwd_inner_microstep: 5034.10 | bwd_allreduce_microstep: 78.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-29 22:59:40,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.22 | bwd_microstep: 4979.21 | bwd_inner_microstep: 4945.91 | bwd_allreduce_microstep: 33.24 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-29 22:59:48,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 22:59:48,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.84 | bwd_microstep: 5091.47 | bwd_inner_microstep: 4695.44 | bwd_allreduce_microstep: 395.96 | step_microstep: 181.13 [2024-07-29 22:59:48,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28851.84 | bwd: 41083.90 | bwd_inner: 40209.94 | bwd_allreduce: 873.50 | step: 181.70 86%|████████▋ | 579/671 [11:16:34<1:46:49, 69.66s/it] {'loss': 1.1296, 'learning_rate': 9.724139264673116e-07, 'epoch': 0.86} 86%|████████▋ | 579/671 [11:16:34<1:46:49, 69.66s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2355 [2024-07-29 22:59:58,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3639.78 | bwd_microstep: 5466.00 | bwd_inner_microstep: 5044.34 | bwd_allreduce_microstep: 421.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3881 [2024-07-29 23:00:07,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.62 | bwd_microstep: 5367.82 | bwd_inner_microstep: 5301.73 | bwd_allreduce_microstep: 66.03 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3789 [2024-07-29 23:00:15,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3633.07 | bwd_microstep: 5192.52 | bwd_inner_microstep: 5138.04 | bwd_allreduce_microstep: 54.41 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3769 [2024-07-29 23:00:24,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.26 | bwd_microstep: 5013.63 | bwd_inner_microstep: 4994.30 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3769 [2024-07-29 23:00:33,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3631.48 | bwd_microstep: 5233.11 | bwd_inner_microstep: 5171.89 | bwd_allreduce_microstep: 61.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-29 23:00:42,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.87 | bwd_microstep: 4998.78 | bwd_inner_microstep: 4961.58 | bwd_allreduce_microstep: 37.14 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2129 [2024-07-29 23:00:51,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.32 | bwd_microstep: 5245.98 | bwd_inner_microstep: 4840.11 | bwd_allreduce_microstep: 405.81 | step_microstep: 0.18 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2164 [2024-07-29 23:01:00,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 23:01:00,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.40 | bwd_microstep: 5174.90 | bwd_inner_microstep: 4772.29 | bwd_allreduce_microstep: 402.55 | step_microstep: 180.81 [2024-07-29 23:01:00,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29105.70 | bwd: 41692.73 | bwd_inner: 40224.22 | bwd_allreduce: 1468.04 | step: 181.49 86%|████████▋ | 580/671 [11:17:45<1:46:19, 70.10s/it] {'loss': 1.1447, 'learning_rate': 9.517294753398043e-07, 'epoch': 0.86} 86%|████████▋ | 580/671 [11:17:45<1:46:19, 70.10s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3920 [2024-07-29 23:01:08,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3652.63 | bwd_microstep: 5253.49 | bwd_inner_microstep: 5206.64 | bwd_allreduce_microstep: 46.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3820 [2024-07-29 23:01:17,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.67 | bwd_microstep: 5229.93 | bwd_inner_microstep: 5174.33 | bwd_allreduce_microstep: 55.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3755 [2024-07-29 23:01:26,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.03 | bwd_microstep: 4993.67 | bwd_inner_microstep: 4974.18 | bwd_allreduce_microstep: 19.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3666 [2024-07-29 23:01:35,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.25 | bwd_microstep: 5225.94 | bwd_inner_microstep: 5146.80 | bwd_allreduce_microstep: 79.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3618 [2024-07-29 23:01:44,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.07 | bwd_microstep: 5061.71 | bwd_inner_microstep: 4994.33 | bwd_allreduce_microstep: 67.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3699 [2024-07-29 23:01:52,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.05 | bwd_microstep: 5051.50 | bwd_inner_microstep: 4978.15 | bwd_allreduce_microstep: 73.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3691 [2024-07-29 23:02:01,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.57 | bwd_microstep: 4931.95 | bwd_inner_microstep: 4909.37 | bwd_allreduce_microstep: 22.51 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3725 [2024-07-29 23:02:10,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 23:02:10,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.25 | bwd_microstep: 4977.97 | bwd_inner_microstep: 4937.12 | bwd_allreduce_microstep: 40.79 | step_microstep: 180.91 [2024-07-29 23:02:10,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29015.45 | bwd: 40726.14 | bwd_inner: 40320.86 | bwd_allreduce: 404.82 | step: 181.48 87%|████████▋ | 581/671 [11:18:56<1:45:08, 70.09s/it] {'loss': 1.082, 'learning_rate': 9.312563914945461e-07, 'epoch': 0.86} 87%|████████▋ | 581/671 [11:18:56<1:45:08, 70.09s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3572 [2024-07-29 23:02:18,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.52 | bwd_microstep: 5196.32 | bwd_inner_microstep: 5064.20 | bwd_allreduce_microstep: 132.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-29 23:02:27,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.11 | bwd_microstep: 5293.40 | bwd_inner_microstep: 5193.74 | bwd_allreduce_microstep: 99.59 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1593 [2024-07-29 23:02:36,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3465.88 | bwd_microstep: 5185.35 | bwd_inner_microstep: 4784.02 | bwd_allreduce_microstep: 401.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-07-29 23:02:45,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.30 | bwd_microstep: 5068.71 | bwd_inner_microstep: 5026.70 | bwd_allreduce_microstep: 41.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-29 23:02:53,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.79 | bwd_microstep: 5041.72 | bwd_inner_microstep: 5001.61 | bwd_allreduce_microstep: 40.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2173 [2024-07-29 23:03:02,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.77 | bwd_microstep: 5209.33 | bwd_inner_microstep: 4805.24 | bwd_allreduce_microstep: 404.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-29 23:03:11,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.44 | bwd_microstep: 4993.18 | bwd_inner_microstep: 4943.67 | bwd_allreduce_microstep: 49.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-29 23:03:19,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 23:03:19,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.51 | bwd_microstep: 4937.05 | bwd_inner_microstep: 4905.01 | bwd_allreduce_microstep: 31.97 | step_microstep: 181.57 [2024-07-29 23:03:19,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28556.23 | bwd: 40925.03 | bwd_inner: 39724.12 | bwd_allreduce: 1200.44 | step: 182.15 87%|████████▋ | 582/671 [11:20:05<1:43:50, 70.01s/it] {'loss': 1.1625, 'learning_rate': 9.10995153182056e-07, 'epoch': 0.87} 87%|████████▋ | 582/671 [11:20:05<1:43:50, 70.01s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3571 [2024-07-29 23:03:28,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3664.83 | bwd_microstep: 5355.83 | bwd_inner_microstep: 5244.04 | bwd_allreduce_microstep: 111.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2299 [2024-07-29 23:03:37,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.50 | bwd_microstep: 5153.28 | bwd_inner_microstep: 4753.49 | bwd_allreduce_microstep: 399.73 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2276 [2024-07-29 23:03:46,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3508.95 | bwd_microstep: 5164.13 | bwd_inner_microstep: 4760.82 | bwd_allreduce_microstep: 403.24 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2238 [2024-07-29 23:03:55,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.87 | bwd_microstep: 5233.93 | bwd_inner_microstep: 4826.93 | bwd_allreduce_microstep: 406.93 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2199 [2024-07-29 23:04:02,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3004.80 | bwd_microstep: 4856.80 | bwd_inner_microstep: 4481.28 | bwd_allreduce_microstep: 375.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3735 [2024-07-29 23:04:11,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.73 | bwd_microstep: 5191.13 | bwd_inner_microstep: 5099.57 | bwd_allreduce_microstep: 91.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-29 23:04:20,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.99 | bwd_microstep: 5016.98 | bwd_inner_microstep: 4992.58 | bwd_allreduce_microstep: 24.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-29 23:04:29,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 23:04:29,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.95 | bwd_microstep: 5070.04 | bwd_inner_microstep: 5010.83 | bwd_allreduce_microstep: 59.14 | step_microstep: 180.48 [2024-07-29 23:04:29,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28200.52 | bwd: 41042.10 | bwd_inner: 39169.48 | bwd_allreduce: 1872.14 | step: 181.04 87%|████████▋ | 583/671 [11:21:15<1:42:29, 69.88s/it] {'loss': 1.1774, 'learning_rate': 8.909462337041508e-07, 'epoch': 0.87} 87%|████████▋ | 583/671 [11:21:15<1:42:29, 69.88s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3893 [2024-07-29 23:04:38,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3809.28 | bwd_microstep: 5117.93 | bwd_inner_microstep: 5098.82 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2224 [2024-07-29 23:04:47,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.44 | bwd_microstep: 5165.36 | bwd_inner_microstep: 4764.82 | bwd_allreduce_microstep: 400.47 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-29 23:04:55,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.81 | bwd_microstep: 4985.95 | bwd_inner_microstep: 4966.09 | bwd_allreduce_microstep: 19.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3663 [2024-07-29 23:05:03,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.64 | bwd_microstep: 4849.74 | bwd_inner_microstep: 4808.43 | bwd_allreduce_microstep: 41.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3695 [2024-07-29 23:05:12,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.54 | bwd_microstep: 5080.64 | bwd_inner_microstep: 4996.67 | bwd_allreduce_microstep: 83.90 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-29 23:05:21,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3702.63 | bwd_microstep: 4923.62 | bwd_inner_microstep: 4898.58 | bwd_allreduce_microstep: 24.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3655 [2024-07-29 23:05:29,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.50 | bwd_microstep: 5034.57 | bwd_inner_microstep: 4975.86 | bwd_allreduce_microstep: 58.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-29 23:05:38,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 23:05:38,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.89 | bwd_microstep: 4941.26 | bwd_inner_microstep: 4916.29 | bwd_allreduce_microstep: 24.90 | step_microstep: 180.87 [2024-07-29 23:05:38,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28879.63 | bwd: 40099.05 | bwd_inner: 39425.51 | bwd_allreduce: 673.07 | step: 181.44 87%|████████▋ | 584/671 [11:22:24<1:41:04, 69.71s/it] {'loss': 1.1183, 'learning_rate': 8.711101014028855e-07, 'epoch': 0.87} 87%|████████▋ | 584/671 [11:22:24<1:41:04, 69.71s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3910 [2024-07-29 23:05:47,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.92 | bwd_microstep: 5319.52 | bwd_inner_microstep: 5263.07 | bwd_allreduce_microstep: 56.39 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3599 [2024-07-29 23:05:56,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.37 | bwd_microstep: 5190.00 | bwd_inner_microstep: 5086.91 | bwd_allreduce_microstep: 103.02 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3828 [2024-07-29 23:06:05,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.14 | bwd_microstep: 5052.53 | bwd_inner_microstep: 5033.25 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-29 23:06:14,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.47 | bwd_microstep: 5153.96 | bwd_inner_microstep: 5101.57 | bwd_allreduce_microstep: 52.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3669 [2024-07-29 23:06:22,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3221.00 | bwd_microstep: 4833.04 | bwd_inner_microstep: 4792.28 | bwd_allreduce_microstep: 40.69 | step_microstep: 0.08 dynamic ViT batch size: 11, images per sample: 5.5, dynamic token length: 2096 [2024-07-29 23:06:30,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3499.83 | bwd_microstep: 5094.38 | bwd_inner_microstep: 4699.23 | bwd_allreduce_microstep: 395.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-29 23:06:38,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3178.00 | bwd_microstep: 4698.51 | bwd_inner_microstep: 4675.59 | bwd_allreduce_microstep: 22.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-29 23:06:46,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 23:06:46,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3021.00 | bwd_microstep: 4938.31 | bwd_inner_microstep: 4560.12 | bwd_allreduce_microstep: 378.12 | step_microstep: 181.08 [2024-07-29 23:06:46,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27560.63 | bwd: 40280.24 | bwd_inner: 39211.96 | bwd_allreduce: 1067.81 | step: 181.64 87%|████████▋ | 585/671 [11:23:32<1:39:15, 69.24s/it] {'loss': 1.1269, 'learning_rate': 8.514872196496182e-07, 'epoch': 0.87} 87%|████████▋ | 585/671 [11:23:32<1:39:15, 69.24s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3720 [2024-07-29 23:06:55,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3659.68 | bwd_microstep: 5307.65 | bwd_inner_microstep: 5220.30 | bwd_allreduce_microstep: 87.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3585 [2024-07-29 23:07:04,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.25 | bwd_microstep: 5160.29 | bwd_inner_microstep: 5082.23 | bwd_allreduce_microstep: 77.99 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3742 [2024-07-29 23:07:13,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.26 | bwd_microstep: 5141.22 | bwd_inner_microstep: 5073.06 | bwd_allreduce_microstep: 68.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-29 23:07:22,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.37 | bwd_microstep: 5114.79 | bwd_inner_microstep: 5045.72 | bwd_allreduce_microstep: 69.00 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2178 [2024-07-29 23:07:30,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.40 | bwd_microstep: 5123.83 | bwd_inner_microstep: 4726.64 | bwd_allreduce_microstep: 397.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-29 23:07:39,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.05 | bwd_microstep: 4988.12 | bwd_inner_microstep: 4968.73 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-29 23:07:48,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.50 | bwd_microstep: 4979.43 | bwd_inner_microstep: 4960.01 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-29 23:07:57,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 23:07:57,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.58 | bwd_microstep: 5343.12 | bwd_inner_microstep: 5191.16 | bwd_allreduce_microstep: 151.89 | step_microstep: 180.32 [2024-07-29 23:07:57,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28999.01 | bwd: 41158.42 | bwd_inner: 40267.79 | bwd_allreduce: 890.16 | step: 180.89 87%|████████▋ | 586/671 [11:24:43<1:38:37, 69.62s/it] {'loss': 1.0736, 'learning_rate': 8.320780468341761e-07, 'epoch': 0.87} 87%|████████▋ | 586/671 [11:24:43<1:38:37, 69.62s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2435 [2024-07-29 23:08:06,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.43 | bwd_microstep: 5476.39 | bwd_inner_microstep: 5053.61 | bwd_allreduce_microstep: 422.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3568 [2024-07-29 23:08:15,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.90 | bwd_microstep: 5200.14 | bwd_inner_microstep: 5105.05 | bwd_allreduce_microstep: 95.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-29 23:08:24,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.93 | bwd_microstep: 5039.54 | bwd_inner_microstep: 4974.93 | bwd_allreduce_microstep: 64.54 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2167 [2024-07-29 23:08:32,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.35 | bwd_microstep: 5236.57 | bwd_inner_microstep: 4831.01 | bwd_allreduce_microstep: 405.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 23:08:41,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3512.13 | bwd_microstep: 5146.78 | bwd_inner_microstep: 4745.92 | bwd_allreduce_microstep: 400.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-29 23:08:50,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3463.35 | bwd_microstep: 5038.10 | bwd_inner_microstep: 4647.42 | bwd_allreduce_microstep: 390.62 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2135 [2024-07-29 23:08:58,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3482.36 | bwd_microstep: 5038.54 | bwd_inner_microstep: 4645.26 | bwd_allreduce_microstep: 393.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-29 23:09:07,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-29 23:09:07,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.10 | bwd_microstep: 4976.95 | bwd_inner_microstep: 4923.97 | bwd_allreduce_microstep: 52.91 | step_microstep: 180.95 [2024-07-29 23:09:07,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28390.44 | bwd: 41152.99 | bwd_inner: 38927.10 | bwd_allreduce: 2225.41 | step: 181.53 87%|████████▋ | 587/671 [11:25:53<1:37:34, 69.69s/it] {'loss': 1.1115, 'learning_rate': 8.128830363541574e-07, 'epoch': 0.87} 87%|████████▋ | 587/671 [11:25:53<1:37:34, 69.69s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3956 [2024-07-29 23:09:16,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.36 | bwd_microstep: 5411.24 | bwd_inner_microstep: 5336.14 | bwd_allreduce_microstep: 75.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2254 [2024-07-29 23:09:25,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.81 | bwd_microstep: 5182.06 | bwd_inner_microstep: 4777.81 | bwd_allreduce_microstep: 404.19 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3807 [2024-07-29 23:09:33,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.96 | bwd_microstep: 5039.23 | bwd_inner_microstep: 5019.86 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3605 [2024-07-29 23:09:42,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.22 | bwd_microstep: 5166.26 | bwd_inner_microstep: 5071.61 | bwd_allreduce_microstep: 94.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-29 23:09:51,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.07 | bwd_microstep: 4982.73 | bwd_inner_microstep: 4963.39 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-29 23:09:59,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3199.78 | bwd_microstep: 4794.19 | bwd_inner_microstep: 4774.82 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.18 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-29 23:10:08,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3697.36 | bwd_microstep: 4899.84 | bwd_inner_microstep: 4880.43 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3649 [2024-07-29 23:10:16,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-29 23:10:16,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3194.86 | bwd_microstep: 4709.52 | bwd_inner_microstep: 4686.27 | bwd_allreduce_microstep: 23.17 | step_microstep: 181.45 [2024-07-29 23:10:16,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28357.32 | bwd: 40185.04 | bwd_inner: 39510.27 | bwd_allreduce: 674.29 | step: 182.13 88%|████████▊ | 588/671 [11:27:02<1:36:04, 69.45s/it] {'loss': 1.1085, 'learning_rate': 7.939026366043346e-07, 'epoch': 0.88} 88%|████████▊ | 588/671 [11:27:02<1:36:04, 69.45s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3608 [2024-07-29 23:10:25,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.07 | bwd_microstep: 5266.45 | bwd_inner_microstep: 5182.49 | bwd_allreduce_microstep: 83.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-29 23:10:33,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.84 | bwd_microstep: 5199.46 | bwd_inner_microstep: 5119.74 | bwd_allreduce_microstep: 79.66 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2297 [2024-07-29 23:10:42,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.22 | bwd_microstep: 5241.24 | bwd_inner_microstep: 4834.92 | bwd_allreduce_microstep: 406.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2239 [2024-07-29 23:10:51,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.47 | bwd_microstep: 5194.66 | bwd_inner_microstep: 4792.33 | bwd_allreduce_microstep: 402.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3788 [2024-07-29 23:11:00,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3738.82 | bwd_microstep: 5030.29 | bwd_inner_microstep: 5010.87 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-29 23:11:08,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.49 | bwd_microstep: 5077.28 | bwd_inner_microstep: 5018.92 | bwd_allreduce_microstep: 58.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-29 23:11:17,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.00 | bwd_microstep: 5054.92 | bwd_inner_microstep: 5012.56 | bwd_allreduce_microstep: 42.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-29 23:11:26,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-29 23:11:26,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.22 | bwd_microstep: 4985.21 | bwd_inner_microstep: 4936.60 | bwd_allreduce_microstep: 48.54 | step_microstep: 180.72 [2024-07-29 23:11:26,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28766.04 | bwd: 41049.48 | bwd_inner: 39908.38 | bwd_allreduce: 1140.62 | step: 181.30 88%|████████▊ | 589/671 [11:28:12<1:35:11, 69.66s/it] {'loss': 1.1869, 'learning_rate': 7.75137290966177e-07, 'epoch': 0.88} 88%|████████▊ | 589/671 [11:28:12<1:35:11, 69.66s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3563 [2024-07-29 23:11:34,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3265.90 | bwd_microstep: 5034.28 | bwd_inner_microstep: 4961.25 | bwd_allreduce_microstep: 72.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3844 [2024-07-29 23:11:43,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.33 | bwd_microstep: 5230.68 | bwd_inner_microstep: 5176.51 | bwd_allreduce_microstep: 54.10 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3765 [2024-07-29 23:11:51,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3114.95 | bwd_microstep: 4957.78 | bwd_inner_microstep: 4918.04 | bwd_allreduce_microstep: 39.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-29 23:12:00,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.96 | bwd_microstep: 5176.63 | bwd_inner_microstep: 5093.49 | bwd_allreduce_microstep: 83.07 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3703 [2024-07-29 23:12:08,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3080.95 | bwd_microstep: 4818.76 | bwd_inner_microstep: 4778.93 | bwd_allreduce_microstep: 39.76 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-07-29 23:12:17,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.39 | bwd_microstep: 5011.23 | bwd_inner_microstep: 4986.87 | bwd_allreduce_microstep: 24.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-07-29 23:12:25,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.57 | bwd_microstep: 5187.98 | bwd_inner_microstep: 4785.01 | bwd_allreduce_microstep: 402.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3719 [2024-07-29 23:12:34,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 23:12:34,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3218.54 | bwd_microstep: 4778.27 | bwd_inner_microstep: 4758.82 | bwd_allreduce_microstep: 19.38 | step_microstep: 181.24 [2024-07-29 23:12:34,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27281.50 | bwd: 40195.58 | bwd_inner: 39458.86 | bwd_allreduce: 736.25 | step: 181.82 88%|████████▊ | 590/671 [11:29:20<1:33:17, 69.10s/it] {'loss': 1.1166, 'learning_rate': 7.565874377975046e-07, 'epoch': 0.88} 88%|████████▊ | 590/671 [11:29:20<1:33:17, 69.10s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2468 [2024-07-29 23:12:43,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3655.19 | bwd_microstep: 5497.76 | bwd_inner_microstep: 5076.43 | bwd_allreduce_microstep: 421.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3816 [2024-07-29 23:12:52,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.58 | bwd_microstep: 5190.04 | bwd_inner_microstep: 5136.40 | bwd_allreduce_microstep: 53.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3804 [2024-07-29 23:13:00,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.24 | bwd_microstep: 5144.46 | bwd_inner_microstep: 5098.22 | bwd_allreduce_microstep: 46.18 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-29 23:13:09,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.59 | bwd_microstep: 5180.82 | bwd_inner_microstep: 5096.54 | bwd_allreduce_microstep: 84.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3642 [2024-07-29 23:13:18,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.58 | bwd_microstep: 5174.50 | bwd_inner_microstep: 5091.13 | bwd_allreduce_microstep: 83.30 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3648 [2024-07-29 23:13:26,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3188.79 | bwd_microstep: 4774.02 | bwd_inner_microstep: 4738.46 | bwd_allreduce_microstep: 35.49 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-29 23:13:35,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.89 | bwd_microstep: 5167.70 | bwd_inner_microstep: 5096.47 | bwd_allreduce_microstep: 71.17 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-29 23:13:44,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.68 [2024-07-29 23:13:44,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3681.29 | bwd_microstep: 4896.39 | bwd_inner_microstep: 4877.00 | bwd_allreduce_microstep: 19.32 | step_microstep: 181.45 [2024-07-29 23:13:44,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28587.05 | bwd: 41025.67 | bwd_inner: 40210.58 | bwd_allreduce: 814.62 | step: 182.01 88%|████████▊ | 591/671 [11:30:30<1:32:28, 69.35s/it] {'loss': 1.124, 'learning_rate': 7.382535104222344e-07, 'epoch': 0.88} 88%|████████▊ | 591/671 [11:30:30<1:32:28, 69.35s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3918 [2024-07-29 23:13:53,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3857.47 | bwd_microstep: 5232.65 | bwd_inner_microstep: 5201.78 | bwd_allreduce_microstep: 30.80 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3859 [2024-07-29 23:14:02,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.42 | bwd_microstep: 5106.19 | bwd_inner_microstep: 5086.70 | bwd_allreduce_microstep: 19.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2235 [2024-07-29 23:14:10,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3059.07 | bwd_microstep: 5063.90 | bwd_inner_microstep: 4674.38 | bwd_allreduce_microstep: 389.46 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3747 [2024-07-29 23:14:18,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3731.81 | bwd_microstep: 4993.77 | bwd_inner_microstep: 4974.46 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-29 23:14:27,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.18 | bwd_microstep: 4984.53 | bwd_inner_microstep: 4965.10 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3709 [2024-07-29 23:14:36,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3622.39 | bwd_microstep: 5166.15 | bwd_inner_microstep: 5089.47 | bwd_allreduce_microstep: 76.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-29 23:14:45,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.24 | bwd_microstep: 4919.19 | bwd_inner_microstep: 4895.17 | bwd_allreduce_microstep: 23.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-29 23:14:54,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-29 23:14:54,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.91 | bwd_microstep: 4991.28 | bwd_inner_microstep: 4971.96 | bwd_allreduce_microstep: 19.25 | step_microstep: 181.17 [2024-07-29 23:14:54,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29250.38 | bwd: 40457.64 | bwd_inner: 39858.97 | bwd_allreduce: 598.20 | step: 181.74 88%|████████▊ | 592/671 [11:31:40<1:31:35, 69.56s/it] {'loss': 1.1771, 'learning_rate': 7.201359371202698e-07, 'epoch': 0.88} 88%|████████▊ | 592/671 [11:31:40<1:31:35, 69.56s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3925 [2024-07-29 23:15:02,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3637.93 | bwd_microstep: 5172.30 | bwd_inner_microstep: 5112.07 | bwd_allreduce_microstep: 60.16 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3810 [2024-07-29 23:15:11,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.11 | bwd_microstep: 5182.15 | bwd_inner_microstep: 5132.58 | bwd_allreduce_microstep: 49.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-07-29 23:15:20,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.26 | bwd_microstep: 5288.77 | bwd_inner_microstep: 4880.00 | bwd_allreduce_microstep: 408.69 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3617 [2024-07-29 23:15:29,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.43 | bwd_microstep: 5168.89 | bwd_inner_microstep: 5073.34 | bwd_allreduce_microstep: 95.48 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-29 23:15:37,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3549.91 | bwd_microstep: 5015.55 | bwd_inner_microstep: 4978.52 | bwd_allreduce_microstep: 36.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-29 23:15:46,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3679.05 | bwd_microstep: 4895.46 | bwd_inner_microstep: 4876.05 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-29 23:15:54,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3198.40 | bwd_microstep: 4721.00 | bwd_inner_microstep: 4694.02 | bwd_allreduce_microstep: 26.91 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2143 [2024-07-29 23:16:02,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-29 23:16:02,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3042.77 | bwd_microstep: 4983.97 | bwd_inner_microstep: 4599.61 | bwd_allreduce_microstep: 384.30 | step_microstep: 182.34 [2024-07-29 23:16:02,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27880.77 | bwd: 40428.06 | bwd_inner: 39346.13 | bwd_allreduce: 1081.46 | step: 182.92 88%|████████▊ | 593/671 [11:32:48<1:30:04, 69.28s/it] {'loss': 1.1086, 'learning_rate': 7.022351411174866e-07, 'epoch': 0.88} 88%|████████▊ | 593/671 [11:32:48<1:30:04, 69.28s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3833 [2024-07-29 23:16:11,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3779.27 | bwd_microstep: 5045.26 | bwd_inner_microstep: 5026.18 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3795 [2024-07-29 23:16:20,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.59 | bwd_microstep: 5137.76 | bwd_inner_microstep: 5092.34 | bwd_allreduce_microstep: 45.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3788 [2024-07-29 23:16:29,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.18 | bwd_microstep: 5062.04 | bwd_inner_microstep: 5036.80 | bwd_allreduce_microstep: 25.18 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3761 [2024-07-29 23:16:37,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.09 | bwd_microstep: 5008.60 | bwd_inner_microstep: 4988.04 | bwd_allreduce_microstep: 20.49 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-29 23:16:46,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.91 | bwd_microstep: 5004.80 | bwd_inner_microstep: 4985.47 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-29 23:16:55,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.37 | bwd_microstep: 5049.07 | bwd_inner_microstep: 4992.56 | bwd_allreduce_microstep: 56.44 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3696 [2024-07-29 23:17:03,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.49 | bwd_microstep: 5031.95 | bwd_inner_microstep: 4960.29 | bwd_allreduce_microstep: 71.60 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3685 [2024-07-29 23:17:12,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-29 23:17:12,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.34 | bwd_microstep: 5151.84 | bwd_inner_microstep: 5081.81 | bwd_allreduce_microstep: 69.96 | step_microstep: 181.05 [2024-07-29 23:17:12,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29346.15 | bwd: 40491.31 | bwd_inner: 40163.43 | bwd_allreduce: 327.40 | step: 181.62 89%|████████▊ | 594/671 [11:33:58<1:29:15, 69.55s/it] {'loss': 1.1588, 'learning_rate': 6.845515405758518e-07, 'epoch': 0.88} 89%|████████▊ | 594/671 [11:33:58<1:29:15, 69.55s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2310 [2024-07-29 23:17:21,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3072.41 | bwd_microstep: 5064.00 | bwd_inner_microstep: 4677.77 | bwd_allreduce_microstep: 386.17 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3805 [2024-07-29 23:17:29,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.75 | bwd_microstep: 5199.04 | bwd_inner_microstep: 5144.68 | bwd_allreduce_microstep: 54.29 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2204 [2024-07-29 23:17:38,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3058.22 | bwd_microstep: 5020.04 | bwd_inner_microstep: 4632.61 | bwd_allreduce_microstep: 387.37 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3603 [2024-07-29 23:17:46,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.04 | bwd_microstep: 5120.59 | bwd_inner_microstep: 5022.14 | bwd_allreduce_microstep: 98.38 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3792 [2024-07-29 23:17:55,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3556.16 | bwd_microstep: 5093.44 | bwd_inner_microstep: 5038.78 | bwd_allreduce_microstep: 54.59 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3661 [2024-07-29 23:18:04,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.32 | bwd_microstep: 5057.57 | bwd_inner_microstep: 4975.32 | bwd_allreduce_microstep: 82.18 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3690 [2024-07-29 23:18:12,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.20 | bwd_microstep: 5156.80 | bwd_inner_microstep: 5064.20 | bwd_allreduce_microstep: 92.54 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2149 [2024-07-29 23:18:21,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.60 [2024-07-29 23:18:21,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3520.24 | bwd_microstep: 5101.76 | bwd_inner_microstep: 4708.07 | bwd_allreduce_microstep: 393.62 | step_microstep: 181.03 [2024-07-29 23:18:21,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27629.25 | bwd: 40813.22 | bwd_inner: 39263.51 | bwd_allreduce: 1549.24 | step: 181.69 89%|████████▊ | 595/671 [11:35:07<1:27:48, 69.32s/it] {'loss': 1.1115, 'learning_rate': 6.670855485836525e-07, 'epoch': 0.89} 89%|████████▊ | 595/671 [11:35:07<1:27:48, 69.32s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3938 [2024-07-29 23:18:30,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.08 | bwd_microstep: 5233.39 | bwd_inner_microstep: 5195.90 | bwd_allreduce_microstep: 37.43 | step_microstep: 0.10 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3848 [2024-07-29 23:18:39,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.05 | bwd_microstep: 5243.20 | bwd_inner_microstep: 5176.26 | bwd_allreduce_microstep: 66.88 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3816 [2024-07-29 23:18:48,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.39 | bwd_microstep: 5058.24 | bwd_inner_microstep: 5038.89 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3756 [2024-07-29 23:18:57,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3748.14 | bwd_microstep: 5002.61 | bwd_inner_microstep: 4983.20 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-29 23:19:05,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.12 | bwd_microstep: 5166.45 | bwd_inner_microstep: 5110.86 | bwd_allreduce_microstep: 55.52 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2224 [2024-07-29 23:19:14,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.20 | bwd_microstep: 5238.68 | bwd_inner_microstep: 4832.23 | bwd_allreduce_microstep: 406.39 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2146 [2024-07-29 23:19:23,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3464.44 | bwd_microstep: 5125.25 | bwd_inner_microstep: 4728.72 | bwd_allreduce_microstep: 396.46 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3689 [2024-07-29 23:19:32,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.41 [2024-07-29 23:19:32,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.80 | bwd_microstep: 5088.29 | bwd_inner_microstep: 5014.04 | bwd_allreduce_microstep: 74.18 | step_microstep: 180.79 [2024-07-29 23:19:32,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28981.12 | bwd: 41156.09 | bwd_inner: 40080.04 | bwd_allreduce: 1075.58 | step: 181.37 89%|████████▉ | 596/671 [11:36:18<1:27:04, 69.66s/it] {'loss': 1.1282, 'learning_rate': 6.498375731458529e-07, 'epoch': 0.89} 89%|████████▉ | 596/671 [11:36:18<1:27:04, 69.66s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3917 [2024-07-29 23:19:41,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.96 | bwd_microstep: 5426.44 | bwd_inner_microstep: 5358.57 | bwd_allreduce_microstep: 67.81 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3822 [2024-07-29 23:19:50,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.26 | bwd_microstep: 5168.68 | bwd_inner_microstep: 5100.97 | bwd_allreduce_microstep: 67.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2216 [2024-07-29 23:19:58,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.70 | bwd_microstep: 5244.50 | bwd_inner_microstep: 4839.07 | bwd_allreduce_microstep: 405.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3638 [2024-07-29 23:20:07,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.45 | bwd_microstep: 4832.03 | bwd_inner_microstep: 4790.32 | bwd_allreduce_microstep: 41.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2227 [2024-07-29 23:20:15,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3552.93 | bwd_microstep: 5243.26 | bwd_inner_microstep: 4836.19 | bwd_allreduce_microstep: 407.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3686 [2024-07-29 23:20:24,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.32 | bwd_microstep: 5014.64 | bwd_inner_microstep: 4973.43 | bwd_allreduce_microstep: 41.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2166 [2024-07-29 23:20:33,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.65 | bwd_microstep: 5234.22 | bwd_inner_microstep: 4828.32 | bwd_allreduce_microstep: 405.83 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-07-29 23:20:42,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 23:20:42,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3983.58 | bwd_microstep: 4979.01 | bwd_inner_microstep: 4959.63 | bwd_allreduce_microstep: 19.31 | step_microstep: 181.31 [2024-07-29 23:20:42,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28943.76 | bwd: 41142.76 | bwd_inner: 39686.46 | bwd_allreduce: 1455.82 | step: 181.88 89%|████████▉ | 597/671 [11:37:28<1:26:11, 69.89s/it] {'loss': 1.16, 'learning_rate': 6.32808017174551e-07, 'epoch': 0.89} 89%|████████▉ | 597/671 [11:37:28<1:26:11, 69.89s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2031 [2024-07-29 23:20:51,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3485.35 | bwd_microstep: 5126.84 | bwd_inner_microstep: 4731.49 | bwd_allreduce_microstep: 395.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3812 [2024-07-29 23:20:59,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3740.61 | bwd_microstep: 5029.33 | bwd_inner_microstep: 5009.98 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2268 [2024-07-29 23:21:08,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3073.52 | bwd_microstep: 5065.29 | bwd_inner_microstep: 4673.36 | bwd_allreduce_microstep: 391.86 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2210 [2024-07-29 23:21:16,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3037.32 | bwd_microstep: 4975.99 | bwd_inner_microstep: 4592.23 | bwd_allreduce_microstep: 383.70 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-07-29 23:21:24,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.71 | bwd_microstep: 5150.71 | bwd_inner_microstep: 5097.22 | bwd_allreduce_microstep: 53.42 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-29 23:21:33,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.71 | bwd_microstep: 5012.57 | bwd_inner_microstep: 4988.38 | bwd_allreduce_microstep: 24.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-29 23:21:42,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.96 | bwd_microstep: 5071.30 | bwd_inner_microstep: 5014.52 | bwd_allreduce_microstep: 56.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-29 23:21:51,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 23:21:51,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.36 | bwd_microstep: 5172.60 | bwd_inner_microstep: 5118.69 | bwd_allreduce_microstep: 53.84 | step_microstep: 180.67 [2024-07-29 23:21:51,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27894.46 | bwd: 40604.61 | bwd_inner: 39225.83 | bwd_allreduce: 1378.31 | step: 181.24 89%|████████▉ | 598/671 [11:38:37<1:24:38, 69.57s/it] {'loss': 1.1252, 'learning_rate': 6.159972784795798e-07, 'epoch': 0.89} 89%|████████▉ | 598/671 [11:38:37<1:24:38, 69.57s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2373 [2024-07-29 23:22:00,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.31 | bwd_microstep: 5160.76 | bwd_inner_microstep: 4764.70 | bwd_allreduce_microstep: 395.99 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2040 [2024-07-29 23:22:08,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.63 | bwd_microstep: 5175.60 | bwd_inner_microstep: 4773.86 | bwd_allreduce_microstep: 401.67 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3731 [2024-07-29 23:22:17,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.57 | bwd_microstep: 5108.55 | bwd_inner_microstep: 5058.69 | bwd_allreduce_microstep: 49.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2203 [2024-07-29 23:22:26,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3576.49 | bwd_microstep: 5248.06 | bwd_inner_microstep: 4840.60 | bwd_allreduce_microstep: 407.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-29 23:22:35,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.87 | bwd_microstep: 5098.62 | bwd_inner_microstep: 5034.29 | bwd_allreduce_microstep: 64.26 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2151 [2024-07-29 23:22:43,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.54 | bwd_microstep: 5218.12 | bwd_inner_microstep: 4811.21 | bwd_allreduce_microstep: 406.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3699 [2024-07-29 23:22:52,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3670.74 | bwd_microstep: 4878.11 | bwd_inner_microstep: 4858.74 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3693 [2024-07-29 23:23:01,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 23:23:01,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.78 | bwd_microstep: 5049.67 | bwd_inner_microstep: 4990.59 | bwd_allreduce_microstep: 59.01 | step_microstep: 180.84 [2024-07-29 23:23:01,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28582.83 | bwd: 40937.47 | bwd_inner: 39132.62 | bwd_allreduce: 1804.37 | step: 181.41 89%|████████▉ | 599/671 [11:39:47<1:23:34, 69.65s/it] {'loss': 1.1086, 'learning_rate': 5.994057497592054e-07, 'epoch': 0.89} 89%|████████▉ | 599/671 [11:39:47<1:23:34, 69.65s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2374 [2024-07-29 23:23:10,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.48 | bwd_microstep: 5394.40 | bwd_inner_microstep: 4979.55 | bwd_allreduce_microstep: 414.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3814 [2024-07-29 23:23:19,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.44 | bwd_microstep: 5041.94 | bwd_inner_microstep: 5022.57 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3785 [2024-07-29 23:23:27,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.33 | bwd_microstep: 5062.76 | bwd_inner_microstep: 5023.82 | bwd_allreduce_microstep: 38.87 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-29 23:23:35,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.10 | bwd_microstep: 4847.06 | bwd_inner_microstep: 4803.46 | bwd_allreduce_microstep: 43.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3737 [2024-07-29 23:23:44,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.05 | bwd_microstep: 5076.07 | bwd_inner_microstep: 5033.15 | bwd_allreduce_microstep: 42.85 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3624 [2024-07-29 23:23:52,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.51 | bwd_microstep: 4972.90 | bwd_inner_microstep: 4906.37 | bwd_allreduce_microstep: 66.46 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3657 [2024-07-29 23:24:01,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.12 | bwd_microstep: 5106.89 | bwd_inner_microstep: 5038.15 | bwd_allreduce_microstep: 68.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3719 [2024-07-29 23:24:10,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 23:24:10,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.62 | bwd_microstep: 4994.14 | bwd_inner_microstep: 4974.75 | bwd_allreduce_microstep: 19.33 | step_microstep: 180.49 [2024-07-29 23:24:10,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28562.55 | bwd: 40496.14 | bwd_inner: 39781.77 | bwd_allreduce: 713.89 | step: 181.08 89%|████████▉ | 600/671 [11:40:56<1:22:19, 69.57s/it] {'loss': 1.1208, 'learning_rate': 5.830338185909545e-07, 'epoch': 0.89} 89%|████████▉ | 600/671 [11:40:56<1:22:19, 69.57s/it][INFO|trainer.py:2936] 2024-07-29 23:24:36,885 >> Saving model checkpoint to /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600 [INFO|configuration_utils.py:473] 2024-07-29 23:24:36,887 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/config.json [INFO|configuration_utils.py:594] 2024-07-29 23:24:36,888 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/generation_config.json [INFO|modeling_utils.py:2501] 2024-07-29 23:25:33,486 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2433] 2024-07-29 23:25:33,488 >> tokenizer config file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-07-29 23:25:33,489 >> Special tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-07-29 23:25:33,489 >> added tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/added_tokens.json [2024-07-29 23:25:35,151] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step600 is about to be saved! [2024-07-29 23:25:35,612] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/global_step600/zero_pp_rank_0_mp_rank_00_model_states.pt [2024-07-29 23:25:35,612] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/global_step600/zero_pp_rank_0_mp_rank_00_model_states.pt... [2024-07-29 23:25:37,513] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/global_step600/zero_pp_rank_0_mp_rank_00_model_states.pt. [2024-07-29 23:25:37,795] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-07-29 23:26:29,228] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-07-29 23:26:29,229] [INFO] [engine.py:3431:_save_zero_checkpoint] zero checkpoint saved /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-07-29 23:26:35,060] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step600 is ready now! [INFO|trainer.py:3028] 2024-07-29 23:26:35,088 >> Deleting older checkpoint [/data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/checkpoint-200] due to args.save_total_limit dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3678 [2024-07-29 23:27:00,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3673.52 | bwd_microstep: 5330.62 | bwd_inner_microstep: 5240.92 | bwd_allreduce_microstep: 89.62 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2276 [2024-07-29 23:27:08,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3004.70 | bwd_microstep: 4973.81 | bwd_inner_microstep: 4589.48 | bwd_allreduce_microstep: 384.26 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-29 23:27:17,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.36 | bwd_microstep: 5135.37 | bwd_inner_microstep: 5082.11 | bwd_allreduce_microstep: 53.19 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-29 23:27:24,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3006.06 | bwd_microstep: 4915.49 | bwd_inner_microstep: 4535.99 | bwd_allreduce_microstep: 379.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-29 23:27:33,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.65 | bwd_microstep: 5156.55 | bwd_inner_microstep: 4755.90 | bwd_allreduce_microstep: 400.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3643 [2024-07-29 23:27:41,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3166.78 | bwd_microstep: 4686.83 | bwd_inner_microstep: 4664.43 | bwd_allreduce_microstep: 22.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-29 23:27:50,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.56 | bwd_microstep: 4949.07 | bwd_inner_microstep: 4916.19 | bwd_allreduce_microstep: 32.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-29 23:27:58,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 23:27:58,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3039.03 | bwd_microstep: 4997.11 | bwd_inner_microstep: 4612.66 | bwd_allreduce_microstep: 384.39 | step_microstep: 181.55 [2024-07-29 23:27:58,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 26575.57 | bwd: 40144.83 | bwd_inner: 38397.61 | bwd_allreduce: 1746.73 | step: 182.26 90%|████████▉ | 601/671 [11:44:44<2:16:30, 117.00s/it] {'loss': 1.1315, 'learning_rate': 5.668818674225696e-07, 'epoch': 0.89} 90%|████████▉ | 601/671 [11:44:44<2:16:30, 117.00s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3898 [2024-07-29 23:28:07,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.03 | bwd_microstep: 5191.45 | bwd_inner_microstep: 5153.15 | bwd_allreduce_microstep: 38.23 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2321 [2024-07-29 23:28:15,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.60 | bwd_microstep: 5263.86 | bwd_inner_microstep: 4857.09 | bwd_allreduce_microstep: 406.71 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-07-29 23:28:24,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3465.50 | bwd_microstep: 5100.41 | bwd_inner_microstep: 4704.82 | bwd_allreduce_microstep: 395.53 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2208 [2024-07-29 23:28:33,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.62 | bwd_microstep: 5187.20 | bwd_inner_microstep: 4782.17 | bwd_allreduce_microstep: 404.96 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 1688 [2024-07-29 23:28:41,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3020.11 | bwd_microstep: 5021.12 | bwd_inner_microstep: 4635.48 | bwd_allreduce_microstep: 385.57 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2202 [2024-07-29 23:28:50,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.12 | bwd_microstep: 5180.59 | bwd_inner_microstep: 4779.91 | bwd_allreduce_microstep: 400.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-29 23:28:58,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.28 | bwd_microstep: 5074.46 | bwd_inner_microstep: 5012.83 | bwd_allreduce_microstep: 61.57 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-29 23:29:07,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 23:29:07,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.04 | bwd_microstep: 5259.77 | bwd_inner_microstep: 5108.34 | bwd_allreduce_microstep: 151.36 | step_microstep: 188.95 [2024-07-29 23:29:07,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28041.19 | bwd: 41278.83 | bwd_inner: 39033.73 | bwd_allreduce: 2244.64 | step: 189.65 90%|████████▉ | 602/671 [11:45:53<1:58:13, 102.80s/it] {'loss': 1.0897, 'learning_rate': 5.509502735630601e-07, 'epoch': 0.9} 90%|████████▉ | 602/671 [11:45:53<1:58:13, 102.80s/it]dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3128 [2024-07-29 23:29:16,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.97 | bwd_microstep: 5285.71 | bwd_inner_microstep: 4993.71 | bwd_allreduce_microstep: 291.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3808 [2024-07-29 23:29:25,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.97 | bwd_microstep: 5223.31 | bwd_inner_microstep: 5167.31 | bwd_allreduce_microstep: 55.93 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2073 [2024-07-29 23:29:33,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3031.24 | bwd_microstep: 4972.93 | bwd_inner_microstep: 4589.11 | bwd_allreduce_microstep: 383.76 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-29 23:29:42,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.15 | bwd_microstep: 5183.51 | bwd_inner_microstep: 5102.87 | bwd_allreduce_microstep: 80.57 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3758 [2024-07-29 23:29:51,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.40 | bwd_microstep: 5177.09 | bwd_inner_microstep: 5121.09 | bwd_allreduce_microstep: 55.94 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2249 [2024-07-29 23:29:59,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3027.70 | bwd_microstep: 4979.17 | bwd_inner_microstep: 4596.13 | bwd_allreduce_microstep: 382.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-29 23:30:08,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.07 | bwd_microstep: 5061.54 | bwd_inner_microstep: 5003.53 | bwd_allreduce_microstep: 57.94 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-29 23:30:16,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 23:30:16,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.97 | bwd_microstep: 5130.10 | bwd_inner_microstep: 4732.24 | bwd_allreduce_microstep: 397.78 | step_microstep: 182.15 [2024-07-29 23:30:16,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27584.39 | bwd: 41013.34 | bwd_inner: 39305.93 | bwd_allreduce: 1706.94 | step: 182.86 90%|████████▉ | 603/671 [11:47:02<1:44:59, 92.64s/it] {'loss': 1.1225, 'learning_rate': 5.352394091739022e-07, 'epoch': 0.9} 90%|████████▉ | 603/671 [11:47:02<1:44:59, 92.64s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3556 [2024-07-29 23:30:25,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.14 | bwd_microstep: 5251.68 | bwd_inner_microstep: 5109.11 | bwd_allreduce_microstep: 142.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2262 [2024-07-29 23:30:34,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.68 | bwd_microstep: 5187.05 | bwd_inner_microstep: 4783.67 | bwd_allreduce_microstep: 403.32 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3748 [2024-07-29 23:30:43,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.13 | bwd_microstep: 5184.29 | bwd_inner_microstep: 5108.43 | bwd_allreduce_microstep: 75.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-29 23:30:52,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.85 | bwd_microstep: 5175.69 | bwd_inner_microstep: 5089.60 | bwd_allreduce_microstep: 86.03 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3617 [2024-07-29 23:31:00,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3229.41 | bwd_microstep: 4845.30 | bwd_inner_microstep: 4799.03 | bwd_allreduce_microstep: 46.21 | step_microstep: 0.18 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2113 [2024-07-29 23:31:08,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3496.28 | bwd_microstep: 5103.90 | bwd_inner_microstep: 4707.66 | bwd_allreduce_microstep: 396.17 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-29 23:31:17,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.95 | bwd_microstep: 5067.32 | bwd_inner_microstep: 5007.62 | bwd_allreduce_microstep: 59.64 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3708 [2024-07-29 23:31:26,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-29 23:31:26,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3684.27 | bwd_microstep: 4916.87 | bwd_inner_microstep: 4897.50 | bwd_allreduce_microstep: 19.30 | step_microstep: 181.04 [2024-07-29 23:31:26,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28404.63 | bwd: 40732.08 | bwd_inner: 39502.55 | bwd_allreduce: 1229.05 | step: 181.75 90%|█████████ | 604/671 [11:48:12<1:35:40, 85.69s/it] {'loss': 1.1742, 'learning_rate': 5.197496412603365e-07, 'epoch': 0.9} 90%|█████████ | 604/671 [11:48:12<1:35:40, 85.69s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3883 [2024-07-29 23:31:35,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3786.66 | bwd_microstep: 5139.80 | bwd_inner_microstep: 5120.69 | bwd_allreduce_microstep: 19.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3802 [2024-07-29 23:31:44,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3777.95 | bwd_microstep: 5048.61 | bwd_inner_microstep: 5027.03 | bwd_allreduce_microstep: 21.52 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3790 [2024-07-29 23:31:52,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.61 | bwd_microstep: 5033.07 | bwd_inner_microstep: 5013.61 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3637 [2024-07-29 23:32:01,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.67 | bwd_microstep: 5122.92 | bwd_inner_microstep: 5053.33 | bwd_allreduce_microstep: 69.52 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-29 23:32:10,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.49 | bwd_microstep: 4990.98 | bwd_inner_microstep: 4971.65 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 23:32:19,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.18 | bwd_microstep: 5139.46 | bwd_inner_microstep: 5069.17 | bwd_allreduce_microstep: 70.22 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2169 [2024-07-29 23:32:28,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3530.39 | bwd_microstep: 5453.32 | bwd_inner_microstep: 4910.93 | bwd_allreduce_microstep: 542.33 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1135 [2024-07-29 23:32:37,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-29 23:32:37,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.50 | bwd_microstep: 5148.31 | bwd_inner_microstep: 4749.49 | bwd_allreduce_microstep: 398.75 | step_microstep: 181.43 [2024-07-29 23:32:37,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29248.35 | bwd: 41076.44 | bwd_inner: 39915.86 | bwd_allreduce: 1160.10 | step: 182.13 90%|█████████ | 605/671 [11:49:22<1:29:17, 81.18s/it] {'loss': 1.1343, 'learning_rate': 5.044813316627994e-07, 'epoch': 0.9} 90%|█████████ | 605/671 [11:49:22<1:29:17, 81.18s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3557 [2024-07-29 23:32:45,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.89 | bwd_microstep: 5203.82 | bwd_inner_microstep: 5113.78 | bwd_allreduce_microstep: 89.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3831 [2024-07-29 23:32:54,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.91 | bwd_microstep: 5098.78 | bwd_inner_microstep: 5073.60 | bwd_allreduce_microstep: 25.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3610 [2024-07-29 23:33:03,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.59 | bwd_microstep: 5149.47 | bwd_inner_microstep: 5076.28 | bwd_allreduce_microstep: 73.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-29 23:33:12,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3743.25 | bwd_microstep: 5032.80 | bwd_inner_microstep: 5008.97 | bwd_allreduce_microstep: 23.77 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3722 [2024-07-29 23:33:21,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.03 | bwd_microstep: 4996.46 | bwd_inner_microstep: 4977.11 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-29 23:33:29,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3437.95 | bwd_microstep: 5005.62 | bwd_inner_microstep: 4615.17 | bwd_allreduce_microstep: 390.37 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-29 23:33:38,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.62 | bwd_microstep: 5004.21 | bwd_inner_microstep: 4984.83 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-29 23:33:47,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 23:33:47,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.09 | bwd_microstep: 5315.82 | bwd_inner_microstep: 5175.73 | bwd_allreduce_microstep: 140.02 | step_microstep: 181.10 [2024-07-29 23:33:47,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29200.23 | bwd: 40806.96 | bwd_inner: 40025.43 | bwd_allreduce: 781.04 | step: 181.67 90%|█████████ | 606/671 [11:50:33<1:24:25, 77.93s/it] {'loss': 1.1333, 'learning_rate': 4.894348370484648e-07, 'epoch': 0.9} 90%|█████████ | 606/671 [11:50:33<1:24:25, 77.93s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3972 [2024-07-29 23:33:56,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.78 | bwd_microstep: 5333.73 | bwd_inner_microstep: 5268.43 | bwd_allreduce_microstep: 65.23 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-29 23:34:05,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.50 | bwd_microstep: 5210.45 | bwd_inner_microstep: 5130.49 | bwd_allreduce_microstep: 79.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3598 [2024-07-29 23:34:14,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.71 | bwd_microstep: 5155.25 | bwd_inner_microstep: 5076.14 | bwd_allreduce_microstep: 79.05 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3603 [2024-07-29 23:34:22,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3227.99 | bwd_microstep: 4876.34 | bwd_inner_microstep: 4826.42 | bwd_allreduce_microstep: 49.86 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-29 23:34:30,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.36 | bwd_microstep: 4988.98 | bwd_inner_microstep: 4969.67 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3705 [2024-07-29 23:34:38,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3078.76 | bwd_microstep: 4854.83 | bwd_inner_microstep: 4810.11 | bwd_allreduce_microstep: 44.66 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2195 [2024-07-29 23:34:47,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3538.23 | bwd_microstep: 5107.80 | bwd_inner_microstep: 4712.91 | bwd_allreduce_microstep: 394.82 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3211 [2024-07-29 23:34:55,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 23:34:55,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3066.49 | bwd_microstep: 4890.50 | bwd_inner_microstep: 4797.37 | bwd_allreduce_microstep: 93.07 | step_microstep: 181.11 [2024-07-29 23:34:55,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27573.73 | bwd: 40417.87 | bwd_inner: 39591.46 | bwd_allreduce: 825.93 | step: 181.81 90%|█████████ | 607/671 [11:51:41<1:20:02, 75.05s/it] {'loss': 1.154, 'learning_rate': 4.746105089029229e-07, 'epoch': 0.9} 90%|█████████ | 607/671 [11:51:41<1:20:02, 75.05s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-29 23:35:04,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.32 | bwd_microstep: 5272.10 | bwd_inner_microstep: 5184.33 | bwd_allreduce_microstep: 87.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3575 [2024-07-29 23:35:13,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.04 | bwd_microstep: 5230.03 | bwd_inner_microstep: 5138.89 | bwd_allreduce_microstep: 91.08 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3837 [2024-07-29 23:35:22,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3776.79 | bwd_microstep: 5107.57 | bwd_inner_microstep: 5085.83 | bwd_allreduce_microstep: 21.67 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3610 [2024-07-29 23:35:30,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3115.09 | bwd_microstep: 4991.00 | bwd_inner_microstep: 4918.94 | bwd_allreduce_microstep: 72.00 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3762 [2024-07-29 23:35:39,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.69 | bwd_microstep: 4995.41 | bwd_inner_microstep: 4976.02 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-29 23:35:48,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.84 | bwd_microstep: 5170.66 | bwd_inner_microstep: 5091.91 | bwd_allreduce_microstep: 78.68 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2194 [2024-07-29 23:35:56,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.91 | bwd_microstep: 5143.16 | bwd_inner_microstep: 4744.02 | bwd_allreduce_microstep: 399.07 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2154 [2024-07-29 23:36:04,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 23:36:04,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3023.82 | bwd_microstep: 4924.79 | bwd_inner_microstep: 4546.60 | bwd_allreduce_microstep: 378.12 | step_microstep: 208.14 [2024-07-29 23:36:04,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28050.40 | bwd: 40834.71 | bwd_inner: 39686.49 | bwd_allreduce: 1147.74 | step: 208.84 91%|█████████ | 608/671 [11:52:50<1:16:58, 73.30s/it] {'loss': 1.0746, 'learning_rate': 4.6000869352195607e-07, 'epoch': 0.9} 91%|█████████ | 608/671 [11:52:50<1:16:58, 73.30s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2361 [2024-07-29 23:36:13,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.92 | bwd_microstep: 5241.85 | bwd_inner_microstep: 4837.26 | bwd_allreduce_microstep: 404.53 | step_microstep: 0.10 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3586 [2024-07-29 23:36:22,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.95 | bwd_microstep: 5131.55 | bwd_inner_microstep: 5049.12 | bwd_allreduce_microstep: 82.37 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3782 [2024-07-29 23:36:31,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.47 | bwd_microstep: 5145.78 | bwd_inner_microstep: 5079.46 | bwd_allreduce_microstep: 66.26 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3784 [2024-07-29 23:36:40,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.19 | bwd_microstep: 5026.04 | bwd_inner_microstep: 5006.66 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3829 [2024-07-29 23:36:48,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.23 | bwd_microstep: 5181.60 | bwd_inner_microstep: 5128.13 | bwd_allreduce_microstep: 53.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-29 23:36:57,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3876.22 | bwd_microstep: 4996.71 | bwd_inner_microstep: 4977.29 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3708 [2024-07-29 23:37:06,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.89 | bwd_microstep: 5084.62 | bwd_inner_microstep: 5026.50 | bwd_allreduce_microstep: 58.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-29 23:37:15,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-29 23:37:15,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.62 | bwd_microstep: 4893.93 | bwd_inner_microstep: 4874.47 | bwd_allreduce_microstep: 19.38 | step_microstep: 181.70 [2024-07-29 23:37:15,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29250.40 | bwd: 40702.05 | bwd_inner: 39978.84 | bwd_allreduce: 722.73 | step: 182.28 91%|█████████ | 609/671 [11:54:01<1:14:48, 72.40s/it] {'loss': 1.1104, 'learning_rate': 4.4562973200346413e-07, 'epoch': 0.91} 91%|█████████ | 609/671 [11:54:01<1:14:48, 72.40s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3935 [2024-07-29 23:37:24,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.34 | bwd_microstep: 5348.30 | bwd_inner_microstep: 5291.55 | bwd_allreduce_microstep: 56.68 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2292 [2024-07-29 23:37:33,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.60 | bwd_microstep: 5311.31 | bwd_inner_microstep: 4897.98 | bwd_allreduce_microstep: 413.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3792 [2024-07-29 23:37:41,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.81 | bwd_microstep: 5166.10 | bwd_inner_microstep: 5116.42 | bwd_allreduce_microstep: 49.61 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3634 [2024-07-29 23:37:50,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.65 | bwd_microstep: 5204.28 | bwd_inner_microstep: 5101.72 | bwd_allreduce_microstep: 102.48 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2227 [2024-07-29 23:37:58,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3024.47 | bwd_microstep: 5015.97 | bwd_inner_microstep: 4630.75 | bwd_allreduce_microstep: 385.16 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3707 [2024-07-29 23:38:07,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.38 | bwd_microstep: 5083.27 | bwd_inner_microstep: 5038.75 | bwd_allreduce_microstep: 44.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-29 23:38:15,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3210.80 | bwd_microstep: 4801.73 | bwd_inner_microstep: 4782.24 | bwd_allreduce_microstep: 19.42 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-29 23:38:24,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.44 [2024-07-29 23:38:24,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.20 | bwd_microstep: 5178.45 | bwd_inner_microstep: 5101.34 | bwd_allreduce_microstep: 77.04 | step_microstep: 181.44 [2024-07-29 23:38:24,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28048.15 | bwd: 41109.40 | bwd_inner: 39960.68 | bwd_allreduce: 1148.22 | step: 182.14 91%|█████████ | 610/671 [11:55:10<1:12:42, 71.52s/it] {'loss': 1.1653, 'learning_rate': 4.314739602394813e-07, 'epoch': 0.91} 91%|█████████ | 610/671 [11:55:10<1:12:42, 71.52s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3552 [2024-07-29 23:38:33,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3675.49 | bwd_microstep: 5341.40 | bwd_inner_microstep: 5203.49 | bwd_allreduce_microstep: 137.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3589 [2024-07-29 23:38:42,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.18 | bwd_microstep: 5109.66 | bwd_inner_microstep: 5034.51 | bwd_allreduce_microstep: 75.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-29 23:38:51,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.26 | bwd_microstep: 5174.10 | bwd_inner_microstep: 5098.77 | bwd_allreduce_microstep: 75.25 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2271 [2024-07-29 23:39:00,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.74 | bwd_microstep: 5227.92 | bwd_inner_microstep: 4822.89 | bwd_allreduce_microstep: 404.96 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3664 [2024-07-29 23:39:08,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.90 | bwd_microstep: 5156.63 | bwd_inner_microstep: 5079.01 | bwd_allreduce_microstep: 77.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-29 23:39:17,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3721.16 | bwd_microstep: 4990.58 | bwd_inner_microstep: 4971.23 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3729 [2024-07-29 23:39:26,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.34 | bwd_microstep: 4869.20 | bwd_inner_microstep: 4843.94 | bwd_allreduce_microstep: 25.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3710 [2024-07-29 23:39:34,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 23:39:34,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3203.94 | bwd_microstep: 4711.19 | bwd_inner_microstep: 4690.01 | bwd_allreduce_microstep: 21.11 | step_microstep: 181.02 [2024-07-29 23:39:34,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28522.90 | bwd: 40580.64 | bwd_inner: 39743.79 | bwd_allreduce: 836.38 | step: 181.60 91%|█████████ | 611/671 [11:56:20<1:10:53, 70.90s/it] {'loss': 1.1645, 'learning_rate': 4.1754170890833777e-07, 'epoch': 0.91} 91%|█████████ | 611/671 [11:56:20<1:10:53, 70.90s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2465 [2024-07-29 23:39:43,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3627.68 | bwd_microstep: 5394.66 | bwd_inner_microstep: 4979.02 | bwd_allreduce_microstep: 415.58 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3764 [2024-07-29 23:39:51,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3240.27 | bwd_microstep: 4822.91 | bwd_inner_microstep: 4797.32 | bwd_allreduce_microstep: 25.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-29 23:39:59,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.08 | bwd_microstep: 5112.32 | bwd_inner_microstep: 5042.57 | bwd_allreduce_microstep: 69.68 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3767 [2024-07-29 23:40:08,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.23 | bwd_microstep: 5102.54 | bwd_inner_microstep: 5060.84 | bwd_allreduce_microstep: 41.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-29 23:40:16,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3237.37 | bwd_microstep: 4866.92 | bwd_inner_microstep: 4822.42 | bwd_allreduce_microstep: 44.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3752 [2024-07-29 23:40:25,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.71 | bwd_microstep: 5069.55 | bwd_inner_microstep: 5000.39 | bwd_allreduce_microstep: 69.09 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3677 [2024-07-29 23:40:34,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.74 | bwd_microstep: 5050.07 | bwd_inner_microstep: 4975.77 | bwd_allreduce_microstep: 74.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3700 [2024-07-29 23:40:42,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.64 [2024-07-29 23:40:42,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.64 | bwd_microstep: 5056.50 | bwd_inner_microstep: 4998.65 | bwd_allreduce_microstep: 57.79 | step_microstep: 181.54 [2024-07-29 23:40:42,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27914.63 | bwd: 40475.45 | bwd_inner: 39676.92 | bwd_allreduce: 798.07 | step: 182.12 91%|█████████ | 612/671 [11:57:28<1:09:04, 70.24s/it] {'loss': 1.1725, 'learning_rate': 4.038333034669406e-07, 'epoch': 0.91} 91%|█████████ | 612/671 [11:57:28<1:09:04, 70.24s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-29 23:40:51,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.72 | bwd_microstep: 5302.52 | bwd_inner_microstep: 5208.67 | bwd_allreduce_microstep: 93.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3818 [2024-07-29 23:41:00,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.10 | bwd_microstep: 5087.92 | bwd_inner_microstep: 5049.04 | bwd_allreduce_microstep: 38.82 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2224 [2024-07-29 23:41:09,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.88 | bwd_microstep: 5255.34 | bwd_inner_microstep: 4845.06 | bwd_allreduce_microstep: 410.21 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-29 23:41:18,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.54 | bwd_microstep: 4990.40 | bwd_inner_microstep: 4971.06 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3807 [2024-07-29 23:41:26,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.04 | bwd_microstep: 5039.09 | bwd_inner_microstep: 5019.75 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2103 [2024-07-29 23:41:35,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3468.79 | bwd_microstep: 5122.89 | bwd_inner_microstep: 4728.39 | bwd_allreduce_microstep: 394.43 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-29 23:41:44,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.87 | bwd_microstep: 5224.64 | bwd_inner_microstep: 4818.55 | bwd_allreduce_microstep: 406.02 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3667 [2024-07-29 23:41:52,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 23:41:52,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.33 | bwd_microstep: 5004.74 | bwd_inner_microstep: 4945.91 | bwd_allreduce_microstep: 58.76 | step_microstep: 180.84 [2024-07-29 23:41:52,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28787.18 | bwd: 41027.52 | bwd_inner: 39586.37 | bwd_allreduce: 1440.68 | step: 181.52 91%|█████████▏| 613/671 [11:58:38<1:07:52, 70.21s/it] {'loss': 1.1453, 'learning_rate': 3.903490641431573e-07, 'epoch': 0.91} 91%|█████████▏| 613/671 [11:58:38<1:07:52, 70.21s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3843 [2024-07-29 23:42:02,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3830.71 | bwd_microstep: 5220.16 | bwd_inner_microstep: 5185.92 | bwd_allreduce_microstep: 34.17 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3608 [2024-07-29 23:42:10,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.25 | bwd_microstep: 5154.71 | bwd_inner_microstep: 5059.76 | bwd_allreduce_microstep: 94.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3802 [2024-07-29 23:42:19,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.46 | bwd_microstep: 5169.45 | bwd_inner_microstep: 5119.05 | bwd_allreduce_microstep: 50.33 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3617 [2024-07-29 23:42:27,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3063.59 | bwd_microstep: 4854.56 | bwd_inner_microstep: 4807.67 | bwd_allreduce_microstep: 46.82 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3640 [2024-07-29 23:42:35,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3108.56 | bwd_microstep: 4937.84 | bwd_inner_microstep: 4877.41 | bwd_allreduce_microstep: 60.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-29 23:42:43,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3204.41 | bwd_microstep: 4774.63 | bwd_inner_microstep: 4741.24 | bwd_allreduce_microstep: 33.33 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3689 [2024-07-29 23:42:52,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3693.40 | bwd_microstep: 4890.11 | bwd_inner_microstep: 4870.71 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 23:43:01,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 23:43:01,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.23 | bwd_microstep: 5129.62 | bwd_inner_microstep: 5054.10 | bwd_allreduce_microstep: 75.45 | step_microstep: 180.58 [2024-07-29 23:43:01,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27671.52 | bwd: 40131.05 | bwd_inner: 39715.80 | bwd_allreduce: 414.78 | step: 181.17 92%|█████████▏| 614/671 [11:59:47<1:06:06, 69.59s/it] {'loss': 1.1292, 'learning_rate': 3.770893059283465e-07, 'epoch': 0.91} 92%|█████████▏| 614/671 [11:59:47<1:06:06, 69.59s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2379 [2024-07-29 23:43:10,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.68 | bwd_microstep: 5653.94 | bwd_inner_microstep: 5249.93 | bwd_allreduce_microstep: 403.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3745 [2024-07-29 23:43:19,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3764.88 | bwd_microstep: 5064.89 | bwd_inner_microstep: 5038.88 | bwd_allreduce_microstep: 25.94 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3781 [2024-07-29 23:43:27,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.25 | bwd_microstep: 5146.43 | bwd_inner_microstep: 5099.71 | bwd_allreduce_microstep: 46.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3748 [2024-07-29 23:43:36,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3753.30 | bwd_microstep: 5007.79 | bwd_inner_microstep: 4988.11 | bwd_allreduce_microstep: 19.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-29 23:43:45,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3617.25 | bwd_microstep: 5158.37 | bwd_inner_microstep: 5087.88 | bwd_allreduce_microstep: 70.43 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2160 [2024-07-29 23:43:54,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3515.66 | bwd_microstep: 5156.90 | bwd_inner_microstep: 4755.65 | bwd_allreduce_microstep: 401.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-29 23:44:02,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3698.94 | bwd_microstep: 4902.66 | bwd_inner_microstep: 4883.29 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-29 23:44:11,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-29 23:44:11,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3479.65 | bwd_microstep: 5038.74 | bwd_inner_microstep: 4648.82 | bwd_allreduce_microstep: 389.85 | step_microstep: 180.54 [2024-07-29 23:44:11,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28972.52 | bwd: 41129.71 | bwd_inner: 39752.22 | bwd_allreduce: 1377.02 | step: 181.12 92%|█████████▏| 615/671 [12:00:57<1:05:11, 69.84s/it] {'loss': 1.0625, 'learning_rate': 3.6405433856999684e-07, 'epoch': 0.92} 92%|█████████▏| 615/671 [12:00:57<1:05:11, 69.84s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2483 [2024-07-29 23:44:20,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3665.15 | bwd_microstep: 5258.69 | bwd_inner_microstep: 4852.52 | bwd_allreduce_microstep: 406.11 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-29 23:44:29,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3608.70 | bwd_microstep: 5606.16 | bwd_inner_microstep: 5516.87 | bwd_allreduce_microstep: 89.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3792 [2024-07-29 23:44:38,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3766.83 | bwd_microstep: 5026.40 | bwd_inner_microstep: 5006.99 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-29 23:44:47,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.46 | bwd_microstep: 5154.29 | bwd_inner_microstep: 5102.35 | bwd_allreduce_microstep: 51.88 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2114 [2024-07-29 23:44:56,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.09 | bwd_microstep: 5168.10 | bwd_inner_microstep: 4766.68 | bwd_allreduce_microstep: 401.35 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3675 [2024-07-29 23:45:04,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.99 | bwd_microstep: 5116.77 | bwd_inner_microstep: 5064.19 | bwd_allreduce_microstep: 52.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-29 23:45:13,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.24 | bwd_microstep: 5083.35 | bwd_inner_microstep: 5020.53 | bwd_allreduce_microstep: 62.75 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2171 [2024-07-29 23:45:22,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 23:45:22,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3457.30 | bwd_microstep: 5039.30 | bwd_inner_microstep: 4646.59 | bwd_allreduce_microstep: 392.64 | step_microstep: 641.84 [2024-07-29 23:45:22,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28812.65 | bwd: 41453.05 | bwd_inner: 39976.67 | bwd_allreduce: 1475.91 | step: 642.54 92%|█████████▏| 616/671 [12:02:08<1:04:21, 70.21s/it] {'loss': 1.1049, 'learning_rate': 3.5124446656448654e-07, 'epoch': 0.92} 92%|█████████▏| 616/671 [12:02:08<1:04:21, 70.21s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2312 [2024-07-29 23:45:30,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3106.80 | bwd_microstep: 5130.25 | bwd_inner_microstep: 4736.84 | bwd_allreduce_microstep: 393.35 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1028 [2024-07-29 23:45:39,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3079.51 | bwd_microstep: 5190.28 | bwd_inner_microstep: 4794.07 | bwd_allreduce_microstep: 396.15 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3770 [2024-07-29 23:45:47,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3732.48 | bwd_microstep: 4983.92 | bwd_inner_microstep: 4964.66 | bwd_allreduce_microstep: 19.19 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-29 23:45:56,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.18 | bwd_microstep: 5188.97 | bwd_inner_microstep: 5135.65 | bwd_allreduce_microstep: 53.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-29 23:46:05,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.12 | bwd_microstep: 5210.35 | bwd_inner_microstep: 5123.78 | bwd_allreduce_microstep: 86.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-29 23:46:14,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3630.36 | bwd_microstep: 5181.41 | bwd_inner_microstep: 5104.13 | bwd_allreduce_microstep: 77.22 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3723 [2024-07-29 23:46:22,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.23 | bwd_microstep: 4996.99 | bwd_inner_microstep: 4939.03 | bwd_allreduce_microstep: 57.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3682 [2024-07-29 23:46:31,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.71 [2024-07-29 23:46:31,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3669.87 | bwd_microstep: 4891.08 | bwd_inner_microstep: 4871.71 | bwd_allreduce_microstep: 19.30 | step_microstep: 180.66 [2024-07-29 23:46:31,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27998.45 | bwd: 40773.23 | bwd_inner: 39669.79 | bwd_allreduce: 1102.96 | step: 181.25 92%|█████████▏| 617/671 [12:03:17<1:02:53, 69.87s/it] {'loss': 1.0797, 'learning_rate': 3.3865998914997645e-07, 'epoch': 0.92} 92%|█████████▏| 617/671 [12:03:17<1:02:53, 69.87s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3923 [2024-07-29 23:46:40,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.31 | bwd_microstep: 5266.46 | bwd_inner_microstep: 5223.09 | bwd_allreduce_microstep: 43.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-29 23:46:49,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.26 | bwd_microstep: 5192.58 | bwd_inner_microstep: 5106.57 | bwd_allreduce_microstep: 85.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2215 [2024-07-29 23:46:57,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3443.82 | bwd_microstep: 5050.24 | bwd_inner_microstep: 4660.96 | bwd_allreduce_microstep: 389.21 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2226 [2024-07-29 23:47:06,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3060.89 | bwd_microstep: 4998.92 | bwd_inner_microstep: 4614.01 | bwd_allreduce_microstep: 384.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-29 23:47:14,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.30 | bwd_microstep: 4964.39 | bwd_inner_microstep: 4916.91 | bwd_allreduce_microstep: 47.42 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-29 23:47:23,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3454.59 | bwd_microstep: 5057.38 | bwd_inner_microstep: 4664.69 | bwd_allreduce_microstep: 392.63 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3714 [2024-07-29 23:47:31,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3672.24 | bwd_microstep: 5088.50 | bwd_inner_microstep: 5069.21 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3670 [2024-07-29 23:47:40,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-29 23:47:40,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.06 | bwd_microstep: 5022.24 | bwd_inner_microstep: 4969.07 | bwd_allreduce_microstep: 53.10 | step_microstep: 371.59 [2024-07-29 23:47:40,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28050.36 | bwd: 40640.69 | bwd_inner: 39224.44 | bwd_allreduce: 1415.79 | step: 372.16 92%|█████████▏| 618/671 [12:04:26<1:01:32, 69.68s/it] {'loss': 1.0995, 'learning_rate': 3.2630120029942034e-07, 'epoch': 0.92} 92%|█████████▏| 618/671 [12:04:26<1:01:32, 69.68s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3913 [2024-07-29 23:47:50,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3898.92 | bwd_microstep: 5375.78 | bwd_inner_microstep: 5323.87 | bwd_allreduce_microstep: 51.84 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3601 [2024-07-29 23:47:58,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.93 | bwd_microstep: 5147.52 | bwd_inner_microstep: 5070.63 | bwd_allreduce_microstep: 76.83 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2244 [2024-07-29 23:48:07,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.24 | bwd_microstep: 5180.16 | bwd_inner_microstep: 4777.10 | bwd_allreduce_microstep: 402.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3645 [2024-07-29 23:48:15,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3228.05 | bwd_microstep: 4783.99 | bwd_inner_microstep: 4746.62 | bwd_allreduce_microstep: 37.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3655 [2024-07-29 23:48:23,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3113.79 | bwd_microstep: 4962.78 | bwd_inner_microstep: 4905.10 | bwd_allreduce_microstep: 57.62 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-29 23:48:32,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.15 | bwd_microstep: 5262.55 | bwd_inner_microstep: 4854.64 | bwd_allreduce_microstep: 407.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3679 [2024-07-29 23:48:41,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.94 | bwd_microstep: 5082.57 | bwd_inner_microstep: 5024.66 | bwd_allreduce_microstep: 57.84 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3683 [2024-07-29 23:48:50,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-29 23:48:50,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.25 | bwd_microstep: 5044.05 | bwd_inner_microstep: 4988.42 | bwd_allreduce_microstep: 55.56 | step_microstep: 180.71 [2024-07-29 23:48:50,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28106.18 | bwd: 40839.39 | bwd_inner: 39690.98 | bwd_allreduce: 1147.95 | step: 181.29 92%|█████████▏| 619/671 [12:05:36<1:00:16, 69.55s/it] {'loss': 1.1127, 'learning_rate': 3.1416838871368925e-07, 'epoch': 0.92} 92%|█████████▏| 619/671 [12:05:36<1:00:16, 69.55s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 4006 [2024-07-29 23:48:59,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3870.52 | bwd_microstep: 5249.26 | bwd_inner_microstep: 5230.08 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3837 [2024-07-29 23:49:08,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3584.74 | bwd_microstep: 5139.35 | bwd_inner_microstep: 5095.63 | bwd_allreduce_microstep: 43.65 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3783 [2024-07-29 23:49:16,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.34 | bwd_microstep: 5195.86 | bwd_inner_microstep: 5114.40 | bwd_allreduce_microstep: 81.39 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3787 [2024-07-29 23:49:25,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.66 | bwd_microstep: 5037.66 | bwd_inner_microstep: 5017.50 | bwd_allreduce_microstep: 20.08 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3717 [2024-07-29 23:49:34,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.26 | bwd_microstep: 5169.15 | bwd_inner_microstep: 5113.33 | bwd_allreduce_microstep: 55.75 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3637 [2024-07-29 23:49:43,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3569.27 | bwd_microstep: 5089.93 | bwd_inner_microstep: 5004.36 | bwd_allreduce_microstep: 85.50 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-29 23:49:51,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.26 | bwd_microstep: 5106.60 | bwd_inner_microstep: 4706.91 | bwd_allreduce_microstep: 399.62 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2154 [2024-07-29 23:50:00,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-29 23:50:00,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3481.12 | bwd_microstep: 5086.96 | bwd_inner_microstep: 4692.45 | bwd_allreduce_microstep: 394.45 | step_microstep: 180.62 [2024-07-29 23:50:00,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29000.06 | bwd: 41074.75 | bwd_inner: 39974.62 | bwd_allreduce: 1099.65 | step: 181.31 92%|█████████▏| 620/671 [12:06:46<59:20, 69.81s/it] {'loss': 1.1122, 'learning_rate': 3.0226183781483786e-07, 'epoch': 0.92} 92%|█████████▏| 620/671 [12:06:46<59:20, 69.81s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3872 [2024-07-29 23:50:09,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3836.58 | bwd_microstep: 5159.90 | bwd_inner_microstep: 5136.02 | bwd_allreduce_microstep: 23.81 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3813 [2024-07-29 23:50:18,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.55 | bwd_microstep: 5220.28 | bwd_inner_microstep: 5156.23 | bwd_allreduce_microstep: 63.98 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2082 [2024-07-29 23:50:27,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.36 | bwd_microstep: 5268.14 | bwd_inner_microstep: 4861.09 | bwd_allreduce_microstep: 406.98 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-29 23:50:36,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.59 | bwd_microstep: 5207.23 | bwd_inner_microstep: 5119.39 | bwd_allreduce_microstep: 87.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2214 [2024-07-29 23:50:44,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3044.37 | bwd_microstep: 5015.42 | bwd_inner_microstep: 4630.07 | bwd_allreduce_microstep: 385.28 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2131 [2024-07-29 23:50:52,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.73 | bwd_microstep: 5147.41 | bwd_inner_microstep: 4752.56 | bwd_allreduce_microstep: 394.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3705 [2024-07-29 23:51:01,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.56 | bwd_microstep: 4923.74 | bwd_inner_microstep: 4897.52 | bwd_allreduce_microstep: 26.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2144 [2024-07-29 23:51:10,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 23:51:10,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.56 | bwd_microstep: 5106.55 | bwd_inner_microstep: 4710.88 | bwd_allreduce_microstep: 395.61 | step_microstep: 181.76 [2024-07-29 23:51:10,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28420.20 | bwd: 41048.66 | bwd_inner: 39263.71 | bwd_allreduce: 1784.48 | step: 182.34 93%|█████████▎| 621/671 [12:07:56<58:10, 69.81s/it] {'loss': 1.1484, 'learning_rate': 2.90581825739481e-07, 'epoch': 0.92} 93%|█████████▎| 621/671 [12:07:56<58:10, 69.81s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2312 [2024-07-29 23:51:19,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.67 | bwd_microstep: 5366.05 | bwd_inner_microstep: 4952.62 | bwd_allreduce_microstep: 413.37 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2048 [2024-07-29 23:51:28,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3484.51 | bwd_microstep: 5116.29 | bwd_inner_microstep: 4718.10 | bwd_allreduce_microstep: 398.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3594 [2024-07-29 23:51:36,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.80 | bwd_microstep: 4844.85 | bwd_inner_microstep: 4796.78 | bwd_allreduce_microstep: 48.00 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3759 [2024-07-29 23:51:44,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3695.35 | bwd_microstep: 5090.15 | bwd_inner_microstep: 5053.49 | bwd_allreduce_microstep: 36.59 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-29 23:51:53,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.81 | bwd_microstep: 5012.18 | bwd_inner_microstep: 4990.16 | bwd_allreduce_microstep: 21.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-29 23:52:02,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3752.30 | bwd_microstep: 4994.42 | bwd_inner_microstep: 4975.00 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3698 [2024-07-29 23:52:11,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3705.51 | bwd_microstep: 4888.44 | bwd_inner_microstep: 4869.04 | bwd_allreduce_microstep: 19.33 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3705 [2024-07-29 23:52:19,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-29 23:52:19,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.49 | bwd_microstep: 5023.68 | bwd_inner_microstep: 4955.40 | bwd_allreduce_microstep: 68.21 | step_microstep: 181.47 [2024-07-29 23:52:19,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28794.34 | bwd: 40336.05 | bwd_inner: 39310.53 | bwd_allreduce: 1025.06 | step: 182.05 93%|█████████▎| 622/671 [12:09:05<56:55, 69.70s/it] {'loss': 1.0956, 'learning_rate': 2.791286253322856e-07, 'epoch': 0.93} 93%|█████████▎| 622/671 [12:09:05<56:55, 69.70s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2031 [2024-07-29 23:52:28,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.11 | bwd_microstep: 5346.46 | bwd_inner_microstep: 4934.62 | bwd_allreduce_microstep: 411.77 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2257 [2024-07-29 23:52:37,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.72 | bwd_microstep: 5233.16 | bwd_inner_microstep: 4827.59 | bwd_allreduce_microstep: 405.51 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3792 [2024-07-29 23:52:46,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.08 | bwd_microstep: 5024.84 | bwd_inner_microstep: 5005.53 | bwd_allreduce_microstep: 19.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3789 [2024-07-29 23:52:55,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3719.58 | bwd_microstep: 5029.41 | bwd_inner_microstep: 5009.99 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3737 [2024-07-29 23:53:03,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.18 | bwd_microstep: 4998.15 | bwd_inner_microstep: 4978.80 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-29 23:53:12,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3735.95 | bwd_microstep: 4992.12 | bwd_inner_microstep: 4972.80 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3650 [2024-07-29 23:53:21,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3598.21 | bwd_microstep: 5153.22 | bwd_inner_microstep: 5083.52 | bwd_allreduce_microstep: 69.64 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3666 [2024-07-29 23:53:29,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-29 23:53:29,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.57 | bwd_microstep: 4820.59 | bwd_inner_microstep: 4801.19 | bwd_allreduce_microstep: 19.33 | step_microstep: 181.50 [2024-07-29 23:53:29,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29150.30 | bwd: 40597.93 | bwd_inner: 39613.99 | bwd_allreduce: 983.45 | step: 182.10 93%|█████████▎| 623/671 [12:10:15<55:51, 69.81s/it] {'loss': 1.0632, 'learning_rate': 2.679025041396155e-07, 'epoch': 0.93} 93%|█████████▎| 623/671 [12:10:15<55:51, 69.81s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3893 [2024-07-29 23:53:38,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3789.71 | bwd_microstep: 5109.92 | bwd_inner_microstep: 5090.75 | bwd_allreduce_microstep: 19.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3865 [2024-07-29 23:53:47,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3643.21 | bwd_microstep: 5140.25 | bwd_inner_microstep: 5095.13 | bwd_allreduce_microstep: 45.05 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2171 [2024-07-29 23:53:56,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.30 | bwd_microstep: 5244.64 | bwd_inner_microstep: 4837.21 | bwd_allreduce_microstep: 407.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-29 23:54:05,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.94 | bwd_microstep: 5165.14 | bwd_inner_microstep: 5107.55 | bwd_allreduce_microstep: 57.53 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-29 23:54:13,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3466.91 | bwd_microstep: 5042.47 | bwd_inner_microstep: 4649.16 | bwd_allreduce_microstep: 393.25 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2185 [2024-07-29 23:54:21,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2998.25 | bwd_microstep: 4858.29 | bwd_inner_microstep: 4482.90 | bwd_allreduce_microstep: 375.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-29 23:54:29,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3215.46 | bwd_microstep: 4845.23 | bwd_inner_microstep: 4797.98 | bwd_allreduce_microstep: 47.18 | step_microstep: 0.07 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2127 [2024-07-29 23:54:38,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-29 23:54:38,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3501.70 | bwd_microstep: 5123.19 | bwd_inner_microstep: 4726.82 | bwd_allreduce_microstep: 396.31 | step_microstep: 180.86 [2024-07-29 23:54:38,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27819.38 | bwd: 40529.11 | bwd_inner: 38787.43 | bwd_allreduce: 1741.21 | step: 181.44 93%|█████████▎| 624/671 [12:11:24<54:25, 69.47s/it] {'loss': 1.1184, 'learning_rate': 2.569037244032657e-07, 'epoch': 0.93} 93%|█████████▎| 624/671 [12:11:24<54:25, 69.47s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3590 [2024-07-29 23:54:47,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.35 | bwd_microstep: 5192.97 | bwd_inner_microstep: 5091.16 | bwd_allreduce_microstep: 101.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3838 [2024-07-29 23:54:56,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3642.35 | bwd_microstep: 5230.95 | bwd_inner_microstep: 5174.80 | bwd_allreduce_microstep: 56.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2258 [2024-07-29 23:55:05,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.65 | bwd_microstep: 5191.43 | bwd_inner_microstep: 4786.39 | bwd_allreduce_microstep: 404.97 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3622 [2024-07-29 23:55:13,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.31 | bwd_microstep: 5137.29 | bwd_inner_microstep: 5067.82 | bwd_allreduce_microstep: 69.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3718 [2024-07-29 23:55:22,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.86 | bwd_microstep: 4990.49 | bwd_inner_microstep: 4971.19 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3647 [2024-07-29 23:55:31,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.14 | bwd_microstep: 5100.30 | bwd_inner_microstep: 5023.56 | bwd_allreduce_microstep: 76.67 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3710 [2024-07-29 23:55:39,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3666.88 | bwd_microstep: 4895.91 | bwd_inner_microstep: 4876.50 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-29 23:55:48,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-29 23:55:48,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.54 | bwd_microstep: 5074.46 | bwd_inner_microstep: 5007.45 | bwd_allreduce_microstep: 66.94 | step_microstep: 181.51 [2024-07-29 23:55:48,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28907.98 | bwd: 40813.77 | bwd_inner: 39998.82 | bwd_allreduce: 814.49 | step: 182.08 93%|█████████▎| 625/671 [12:12:34<53:23, 69.65s/it] {'loss': 1.1178, 'learning_rate': 2.461325430543482e-07, 'epoch': 0.93} 93%|█████████▎| 625/671 [12:12:34<53:23, 69.65s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2028 [2024-07-29 23:55:57,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.53 | bwd_microstep: 5236.80 | bwd_inner_microstep: 4832.28 | bwd_allreduce_microstep: 404.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2251 [2024-07-29 23:56:05,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3056.62 | bwd_microstep: 5060.09 | bwd_inner_microstep: 4671.23 | bwd_allreduce_microstep: 388.80 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2268 [2024-07-29 23:56:14,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.98 | bwd_microstep: 5224.77 | bwd_inner_microstep: 4817.91 | bwd_allreduce_microstep: 406.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3782 [2024-07-29 23:56:23,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.08 | bwd_microstep: 5126.51 | bwd_inner_microstep: 5079.66 | bwd_allreduce_microstep: 46.79 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-29 23:56:31,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3230.91 | bwd_microstep: 4828.53 | bwd_inner_microstep: 4787.06 | bwd_allreduce_microstep: 41.41 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-07-29 23:56:39,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3683.79 | bwd_microstep: 4982.61 | bwd_inner_microstep: 4945.34 | bwd_allreduce_microstep: 37.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2164 [2024-07-29 23:56:48,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.54 | bwd_microstep: 5164.76 | bwd_inner_microstep: 4762.92 | bwd_allreduce_microstep: 401.77 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3704 [2024-07-29 23:56:57,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-29 23:56:57,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3604.11 | bwd_microstep: 5047.87 | bwd_inner_microstep: 4969.88 | bwd_allreduce_microstep: 77.92 | step_microstep: 180.30 [2024-07-29 23:56:57,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27823.46 | bwd: 40671.93 | bwd_inner: 38866.23 | bwd_allreduce: 1805.24 | step: 180.87 93%|█████████▎| 626/671 [12:13:43<52:03, 69.40s/it] {'loss': 1.1309, 'learning_rate': 2.3558921170728e-07, 'epoch': 0.93} 93%|█████████▎| 626/671 [12:13:43<52:03, 69.40s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2303 [2024-07-29 23:57:05,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3115.63 | bwd_microstep: 5155.60 | bwd_inner_microstep: 4763.97 | bwd_allreduce_microstep: 391.57 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2044 [2024-07-29 23:57:14,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.62 | bwd_microstep: 5276.94 | bwd_inner_microstep: 4868.09 | bwd_allreduce_microstep: 408.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2250 [2024-07-29 23:57:22,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3041.79 | bwd_microstep: 4984.41 | bwd_inner_microstep: 4601.36 | bwd_allreduce_microstep: 382.98 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3762 [2024-07-29 23:57:31,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.66 | bwd_microstep: 5148.65 | bwd_inner_microstep: 5076.85 | bwd_allreduce_microstep: 71.73 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-29 23:57:40,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.80 | bwd_microstep: 5158.40 | bwd_inner_microstep: 4758.76 | bwd_allreduce_microstep: 399.57 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3720 [2024-07-29 23:57:48,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3656.64 | bwd_microstep: 4949.40 | bwd_inner_microstep: 4926.39 | bwd_allreduce_microstep: 22.93 | step_microstep: 0.18 dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3665 [2024-07-29 23:57:57,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.50 | bwd_microstep: 4923.08 | bwd_inner_microstep: 4897.90 | bwd_allreduce_microstep: 25.11 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3688 [2024-07-29 23:58:06,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-29 23:58:06,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.75 | bwd_microstep: 4896.43 | bwd_inner_microstep: 4877.05 | bwd_allreduce_microstep: 19.31 | step_microstep: 181.57 [2024-07-29 23:58:06,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27849.31 | bwd: 40492.87 | bwd_inner: 38770.31 | bwd_allreduce: 1722.09 | step: 182.27 93%|█████████▎| 627/671 [12:14:52<50:44, 69.18s/it] {'loss': 1.1154, 'learning_rate': 2.2527397665391137e-07, 'epoch': 0.93} 93%|█████████▎| 627/671 [12:14:52<50:44, 69.18s/it]dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3932 [2024-07-29 23:58:15,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.99 | bwd_microstep: 5340.57 | bwd_inner_microstep: 5270.90 | bwd_allreduce_microstep: 69.61 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3563 [2024-07-29 23:58:24,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3629.22 | bwd_microstep: 5272.85 | bwd_inner_microstep: 5176.52 | bwd_allreduce_microstep: 96.26 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3785 [2024-07-29 23:58:32,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3781.14 | bwd_microstep: 5028.64 | bwd_inner_microstep: 5006.79 | bwd_allreduce_microstep: 21.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3748 [2024-07-29 23:58:41,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.85 | bwd_microstep: 5059.63 | bwd_inner_microstep: 5033.39 | bwd_allreduce_microstep: 26.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2184 [2024-07-29 23:58:50,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.25 | bwd_microstep: 5218.84 | bwd_inner_microstep: 4812.58 | bwd_allreduce_microstep: 406.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3733 [2024-07-29 23:58:59,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3751.01 | bwd_microstep: 4982.18 | bwd_inner_microstep: 4962.89 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-29 23:59:07,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3207.46 | bwd_microstep: 4720.97 | bwd_inner_microstep: 4696.41 | bwd_allreduce_microstep: 24.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3693 [2024-07-29 23:59:16,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.76 [2024-07-29 23:59:16,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3667.87 | bwd_microstep: 4899.56 | bwd_inner_microstep: 4880.18 | bwd_allreduce_microstep: 19.31 | step_microstep: 181.19 [2024-07-29 23:59:16,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29052.70 | bwd: 40523.22 | bwd_inner: 39839.59 | bwd_allreduce: 683.15 | step: 181.78 94%|█████████▎| 628/671 [12:16:02<49:44, 69.40s/it] {'loss': 1.1322, 'learning_rate': 2.1518707885777147e-07, 'epoch': 0.93} 94%|█████████▎| 628/671 [12:16:02<49:44, 69.40s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 4033 [2024-07-29 23:59:24,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.60 | bwd_microstep: 5194.41 | bwd_inner_microstep: 5175.34 | bwd_allreduce_microstep: 19.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2319 [2024-07-29 23:59:33,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.36 | bwd_microstep: 5206.46 | bwd_inner_microstep: 4802.62 | bwd_allreduce_microstep: 403.78 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3768 [2024-07-29 23:59:42,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.50 | bwd_microstep: 5031.25 | bwd_inner_microstep: 5011.88 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2234 [2024-07-29 23:59:50,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3053.02 | bwd_microstep: 5037.12 | bwd_inner_microstep: 4650.65 | bwd_allreduce_microstep: 386.40 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2114 [2024-07-29 23:59:59,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.89 | bwd_microstep: 5235.71 | bwd_inner_microstep: 4826.35 | bwd_allreduce_microstep: 409.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3706 [2024-07-30 00:00:08,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3741.45 | bwd_microstep: 5051.34 | bwd_inner_microstep: 5010.29 | bwd_allreduce_microstep: 40.98 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3694 [2024-07-30 00:00:16,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.77 | bwd_microstep: 5057.96 | bwd_inner_microstep: 4999.58 | bwd_allreduce_microstep: 58.32 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-30 00:00:25,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-30 00:00:25,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3709.31 | bwd_microstep: 4951.51 | bwd_inner_microstep: 4918.54 | bwd_allreduce_microstep: 32.90 | step_microstep: 181.01 [2024-07-30 00:00:25,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28658.81 | bwd: 40765.75 | bwd_inner: 39395.19 | bwd_allreduce: 1370.09 | step: 181.59 94%|█████████▎| 629/671 [12:17:11<48:39, 69.51s/it] {'loss': 1.1287, 'learning_rate': 2.0532875394844053e-07, 'epoch': 0.94} 94%|█████████▎| 629/671 [12:17:11<48:39, 69.51s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3951 [2024-07-30 00:00:34,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3658.93 | bwd_microstep: 5234.72 | bwd_inner_microstep: 5190.68 | bwd_allreduce_microstep: 43.98 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2272 [2024-07-30 00:00:43,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3474.12 | bwd_microstep: 5112.51 | bwd_inner_microstep: 4716.39 | bwd_allreduce_microstep: 396.04 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3611 [2024-07-30 00:00:51,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.46 | bwd_microstep: 5103.98 | bwd_inner_microstep: 5036.98 | bwd_allreduce_microstep: 66.93 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3628 [2024-07-30 00:01:00,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.94 | bwd_microstep: 5093.44 | bwd_inner_microstep: 5022.81 | bwd_allreduce_microstep: 70.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3767 [2024-07-30 00:01:09,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.63 | bwd_microstep: 4955.37 | bwd_inner_microstep: 4925.75 | bwd_allreduce_microstep: 29.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-30 00:01:17,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.29 | bwd_microstep: 5073.55 | bwd_inner_microstep: 5011.13 | bwd_allreduce_microstep: 62.35 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3700 [2024-07-30 00:01:26,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.48 | bwd_microstep: 5053.93 | bwd_inner_microstep: 5011.80 | bwd_allreduce_microstep: 42.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-30 00:01:34,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-30 00:01:34,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3232.33 | bwd_microstep: 4704.15 | bwd_inner_microstep: 4681.96 | bwd_allreduce_microstep: 22.13 | step_microstep: 180.65 [2024-07-30 00:01:34,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28426.07 | bwd: 40331.64 | bwd_inner: 39597.44 | bwd_allreduce: 733.73 | step: 181.23 94%|█████████▍| 630/671 [12:18:20<47:24, 69.38s/it] {'loss': 1.1287, 'learning_rate': 1.9569923221604224e-07, 'epoch': 0.94} 94%|█████████▍| 630/671 [12:18:20<47:24, 69.38s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3910 [2024-07-30 00:01:43,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3676.64 | bwd_microstep: 5318.59 | bwd_inner_microstep: 5260.08 | bwd_allreduce_microstep: 58.44 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3838 [2024-07-30 00:01:52,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3778.76 | bwd_microstep: 5087.68 | bwd_inner_microstep: 5063.31 | bwd_allreduce_microstep: 24.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3822 [2024-07-30 00:02:01,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.68 | bwd_microstep: 5044.39 | bwd_inner_microstep: 5024.96 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3609 [2024-07-30 00:02:09,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3201.53 | bwd_microstep: 4800.06 | bwd_inner_microstep: 4758.86 | bwd_allreduce_microstep: 41.13 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2158 [2024-07-30 00:02:18,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3523.24 | bwd_microstep: 5162.98 | bwd_inner_microstep: 4762.87 | bwd_allreduce_microstep: 400.04 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3740 [2024-07-30 00:02:27,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.66 | bwd_microstep: 4992.29 | bwd_inner_microstep: 4972.97 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-30 00:02:35,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3517.42 | bwd_microstep: 4943.06 | bwd_inner_microstep: 4902.60 | bwd_allreduce_microstep: 40.40 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-30 00:02:44,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-30 00:02:44,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.95 | bwd_microstep: 5056.80 | bwd_inner_microstep: 4997.84 | bwd_allreduce_microstep: 58.90 | step_microstep: 181.21 [2024-07-30 00:02:44,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28779.76 | bwd: 40405.82 | bwd_inner: 39743.42 | bwd_allreduce: 661.94 | step: 181.91 94%|█████████▍| 631/671 [12:19:30<46:16, 69.42s/it] {'loss': 1.0926, 'learning_rate': 1.8629873860586567e-07, 'epoch': 0.94} 94%|█████████▍| 631/671 [12:19:30<46:16, 69.42s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2331 [2024-07-30 00:02:53,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.42 | bwd_microstep: 5715.38 | bwd_inner_microstep: 5302.43 | bwd_allreduce_microstep: 412.87 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2292 [2024-07-30 00:03:02,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3532.62 | bwd_microstep: 5191.93 | bwd_inner_microstep: 4787.12 | bwd_allreduce_microstep: 404.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3789 [2024-07-30 00:03:11,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3614.34 | bwd_microstep: 5219.67 | bwd_inner_microstep: 5165.16 | bwd_allreduce_microstep: 54.44 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3785 [2024-07-30 00:03:20,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3757.20 | bwd_microstep: 5024.68 | bwd_inner_microstep: 5005.21 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3736 [2024-07-30 00:03:28,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3762.30 | bwd_microstep: 5018.57 | bwd_inner_microstep: 4993.72 | bwd_allreduce_microstep: 24.75 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3644 [2024-07-30 00:03:36,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3222.52 | bwd_microstep: 4851.01 | bwd_inner_microstep: 4804.68 | bwd_allreduce_microstep: 46.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-30 00:03:45,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.50 | bwd_microstep: 5084.71 | bwd_inner_microstep: 5019.79 | bwd_allreduce_microstep: 64.85 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-30 00:03:54,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.77 [2024-07-30 00:03:54,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3760.99 | bwd_microstep: 5012.92 | bwd_inner_microstep: 4971.34 | bwd_allreduce_microstep: 41.51 | step_microstep: 181.69 [2024-07-30 00:03:54,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28767.79 | bwd: 41118.87 | bwd_inner: 40049.40 | bwd_allreduce: 1068.95 | step: 182.30 94%|█████████▍| 632/671 [12:20:40<45:16, 69.66s/it] {'loss': 1.1313, 'learning_rate': 1.7712749271311392e-07, 'epoch': 0.94} 94%|█████████▍| 632/671 [12:20:40<45:16, 69.66s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2374 [2024-07-30 00:04:03,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.00 | bwd_microstep: 5277.90 | bwd_inner_microstep: 4872.51 | bwd_allreduce_microstep: 405.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3787 [2024-07-30 00:04:12,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.39 | bwd_microstep: 5195.63 | bwd_inner_microstep: 5143.36 | bwd_allreduce_microstep: 52.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3619 [2024-07-30 00:04:21,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.11 | bwd_microstep: 5175.39 | bwd_inner_microstep: 5091.47 | bwd_allreduce_microstep: 83.85 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3735 [2024-07-30 00:04:29,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.08 | bwd_microstep: 4986.62 | bwd_inner_microstep: 4967.18 | bwd_allreduce_microstep: 19.35 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2206 [2024-07-30 00:04:38,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3560.68 | bwd_microstep: 5208.06 | bwd_inner_microstep: 4805.48 | bwd_allreduce_microstep: 402.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3687 [2024-07-30 00:04:47,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.01 | bwd_microstep: 4982.56 | bwd_inner_microstep: 4935.81 | bwd_allreduce_microstep: 46.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2156 [2024-07-30 00:04:55,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.33 | bwd_microstep: 5099.07 | bwd_inner_microstep: 4704.81 | bwd_allreduce_microstep: 394.20 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3665 [2024-07-30 00:05:04,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.47 [2024-07-30 00:05:04,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.46 | bwd_microstep: 5000.44 | bwd_inner_microstep: 4931.10 | bwd_allreduce_microstep: 69.27 | step_microstep: 181.47 [2024-07-30 00:05:04,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28723.96 | bwd: 40925.64 | bwd_inner: 39451.64 | bwd_allreduce: 1473.51 | step: 182.04 94%|█████████▍| 633/671 [12:21:50<44:10, 69.76s/it] {'loss': 1.1336, 'learning_rate': 1.681857087777672e-07, 'epoch': 0.94} 94%|█████████▍| 633/671 [12:21:50<44:10, 69.76s/it]dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1308 [2024-07-30 00:05:12,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3046.93 | bwd_microstep: 5109.60 | bwd_inner_microstep: 4717.33 | bwd_allreduce_microstep: 392.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3824 [2024-07-30 00:05:21,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.66 | bwd_microstep: 5072.97 | bwd_inner_microstep: 5048.83 | bwd_allreduce_microstep: 24.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3815 [2024-07-30 00:05:30,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3756.49 | bwd_microstep: 5058.70 | bwd_inner_microstep: 5037.48 | bwd_allreduce_microstep: 21.16 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2232 [2024-07-30 00:05:39,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.16 | bwd_microstep: 5245.32 | bwd_inner_microstep: 4838.02 | bwd_allreduce_microstep: 407.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3671 [2024-07-30 00:05:47,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.50 | bwd_microstep: 5093.24 | bwd_inner_microstep: 5030.72 | bwd_allreduce_microstep: 62.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-30 00:05:56,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.88 | bwd_microstep: 5105.99 | bwd_inner_microstep: 5042.75 | bwd_allreduce_microstep: 63.18 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2134 [2024-07-30 00:06:05,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.76 | bwd_microstep: 5211.04 | bwd_inner_microstep: 4804.00 | bwd_allreduce_microstep: 406.97 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3758 [2024-07-30 00:06:14,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-30 00:06:14,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.50 | bwd_microstep: 4994.46 | bwd_inner_microstep: 4955.15 | bwd_allreduce_microstep: 39.25 | step_microstep: 180.86 [2024-07-30 00:06:14,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28386.76 | bwd: 40891.31 | bwd_inner: 39474.22 | bwd_allreduce: 1416.63 | step: 181.44 94%|█████████▍| 634/671 [12:23:00<42:59, 69.71s/it] {'loss': 1.1612, 'learning_rate': 1.5947359567958677e-07, 'epoch': 0.94} 94%|█████████▍| 634/671 [12:23:00<42:59, 69.71s/it]dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2422 [2024-07-30 00:06:23,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.48 | bwd_microstep: 5319.19 | bwd_inner_microstep: 4907.96 | bwd_allreduce_microstep: 411.16 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2260 [2024-07-30 00:06:31,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3547.10 | bwd_microstep: 5197.40 | bwd_inner_microstep: 4792.54 | bwd_allreduce_microstep: 404.80 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-30 00:06:40,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.49 | bwd_microstep: 5177.97 | bwd_inner_microstep: 5095.52 | bwd_allreduce_microstep: 82.39 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3667 [2024-07-30 00:06:48,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3108.55 | bwd_microstep: 4980.01 | bwd_inner_microstep: 4920.37 | bwd_allreduce_microstep: 59.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-30 00:06:57,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.35 | bwd_microstep: 5098.64 | bwd_inner_microstep: 5053.11 | bwd_allreduce_microstep: 45.46 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3732 [2024-07-30 00:07:06,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3555.16 | bwd_microstep: 5089.10 | bwd_inner_microstep: 5036.13 | bwd_allreduce_microstep: 52.91 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2107 [2024-07-30 00:07:14,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3478.32 | bwd_microstep: 5062.11 | bwd_inner_microstep: 4669.98 | bwd_allreduce_microstep: 392.06 | step_microstep: 0.19 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2174 [2024-07-30 00:07:23,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-30 00:07:23,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3343.28 | bwd_microstep: 5525.00 | bwd_inner_microstep: 4898.68 | bwd_allreduce_microstep: 626.25 | step_microstep: 180.86 [2024-07-30 00:07:23,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27818.64 | bwd: 41449.41 | bwd_inner: 39374.22 | bwd_allreduce: 2074.70 | step: 181.55 95%|█████████▍| 635/671 [12:24:09<41:48, 69.67s/it] {'loss': 1.1071, 'learning_rate': 1.5099135693322776e-07, 'epoch': 0.95} 95%|█████████▍| 635/671 [12:24:09<41:48, 69.67s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3895 [2024-07-30 00:07:33,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3931.71 | bwd_microstep: 5323.97 | bwd_inner_microstep: 5257.09 | bwd_allreduce_microstep: 66.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3798 [2024-07-30 00:07:41,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3582.73 | bwd_microstep: 5156.05 | bwd_inner_microstep: 5109.27 | bwd_allreduce_microstep: 46.72 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3783 [2024-07-30 00:07:50,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.58 | bwd_microstep: 5195.02 | bwd_inner_microstep: 5138.13 | bwd_allreduce_microstep: 56.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3627 [2024-07-30 00:07:59,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.16 | bwd_microstep: 5106.96 | bwd_inner_microstep: 5038.75 | bwd_allreduce_microstep: 68.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3623 [2024-07-30 00:08:07,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.70 | bwd_microstep: 5007.32 | bwd_inner_microstep: 4951.71 | bwd_allreduce_microstep: 55.54 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3713 [2024-07-30 00:08:16,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.34 | bwd_microstep: 4887.50 | bwd_inner_microstep: 4858.74 | bwd_allreduce_microstep: 28.69 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3682 [2024-07-30 00:08:24,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3059.56 | bwd_microstep: 4829.98 | bwd_inner_microstep: 4784.85 | bwd_allreduce_microstep: 45.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-30 00:08:33,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-30 00:08:33,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.89 | bwd_microstep: 5078.90 | bwd_inner_microstep: 5018.19 | bwd_allreduce_microstep: 60.65 | step_microstep: 180.86 [2024-07-30 00:08:33,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28477.56 | bwd: 40585.68 | bwd_inner: 40156.66 | bwd_allreduce: 428.54 | step: 181.45 95%|█████████▍| 636/671 [12:25:19<40:35, 69.59s/it] {'loss': 1.102, 'learning_rate': 1.4273919068349184e-07, 'epoch': 0.95} 95%|█████████▍| 636/671 [12:25:19<40:35, 69.59s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3937 [2024-07-30 00:08:42,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3797.02 | bwd_microstep: 5194.30 | bwd_inner_microstep: 5175.17 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3879 [2024-07-30 00:08:51,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3825.72 | bwd_microstep: 5138.50 | bwd_inner_microstep: 5119.14 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3620 [2024-07-30 00:08:59,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3219.95 | bwd_microstep: 4832.86 | bwd_inner_microstep: 4791.50 | bwd_allreduce_microstep: 41.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-30 00:09:07,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3233.39 | bwd_microstep: 4911.73 | bwd_inner_microstep: 4861.46 | bwd_allreduce_microstep: 50.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3718 [2024-07-30 00:09:16,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3603.97 | bwd_microstep: 5134.12 | bwd_inner_microstep: 5081.29 | bwd_allreduce_microstep: 52.77 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3655 [2024-07-30 00:09:24,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.56 | bwd_microstep: 5112.16 | bwd_inner_microstep: 5023.10 | bwd_allreduce_microstep: 88.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3686 [2024-07-30 00:09:33,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.71 | bwd_microstep: 5101.23 | bwd_inner_microstep: 5037.81 | bwd_allreduce_microstep: 63.36 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3692 [2024-07-30 00:09:42,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.44 [2024-07-30 00:09:42,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.82 | bwd_microstep: 5050.85 | bwd_inner_microstep: 4973.33 | bwd_allreduce_microstep: 77.46 | step_microstep: 180.85 [2024-07-30 00:09:42,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28448.04 | bwd: 40475.72 | bwd_inner: 40062.74 | bwd_allreduce: 412.51 | step: 181.42 95%|█████████▍| 637/671 [12:26:28<39:22, 69.49s/it] {'loss': 1.1045, 'learning_rate': 1.3471728970068986e-07, 'epoch': 0.95} 95%|█████████▍| 637/671 [12:26:28<39:22, 69.49s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3571 [2024-07-30 00:09:50,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3205.57 | bwd_microstep: 4976.67 | bwd_inner_microstep: 4911.91 | bwd_allreduce_microstep: 64.69 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3560 [2024-07-30 00:09:59,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3543.84 | bwd_microstep: 5066.62 | bwd_inner_microstep: 4993.63 | bwd_allreduce_microstep: 72.92 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2231 [2024-07-30 00:10:08,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.04 | bwd_microstep: 5173.62 | bwd_inner_microstep: 4769.29 | bwd_allreduce_microstep: 404.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2241 [2024-07-30 00:10:16,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3571.85 | bwd_microstep: 5224.74 | bwd_inner_microstep: 4820.36 | bwd_allreduce_microstep: 404.31 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3630 [2024-07-30 00:10:25,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.59 | bwd_microstep: 5133.55 | bwd_inner_microstep: 5060.04 | bwd_allreduce_microstep: 73.45 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2191 [2024-07-30 00:10:34,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3506.19 | bwd_microstep: 5157.56 | bwd_inner_microstep: 4757.04 | bwd_allreduce_microstep: 400.45 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2156 [2024-07-30 00:10:42,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3514.10 | bwd_microstep: 5150.08 | bwd_inner_microstep: 4750.22 | bwd_allreduce_microstep: 399.79 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2163 [2024-07-30 00:10:51,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-30 00:10:51,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3503.84 | bwd_microstep: 5098.92 | bwd_inner_microstep: 4702.34 | bwd_allreduce_microstep: 396.52 | step_microstep: 181.16 [2024-07-30 00:10:51,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27991.94 | bwd: 40981.73 | bwd_inner: 38764.75 | bwd_allreduce: 2216.50 | step: 181.76 95%|█████████▌| 638/671 [12:27:37<38:11, 69.43s/it] {'loss': 1.125, 'learning_rate': 1.2692584137615205e-07, 'epoch': 0.95} 95%|█████████▌| 638/671 [12:27:37<38:11, 69.43s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3625 [2024-07-30 00:11:00,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3641.29 | bwd_microstep: 5206.57 | bwd_inner_microstep: 5127.22 | bwd_allreduce_microstep: 79.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2274 [2024-07-30 00:11:09,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.01 | bwd_microstep: 5249.12 | bwd_inner_microstep: 4841.06 | bwd_allreduce_microstep: 407.99 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3806 [2024-07-30 00:11:18,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3605.95 | bwd_microstep: 5154.98 | bwd_inner_microstep: 5102.26 | bwd_allreduce_microstep: 52.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3739 [2024-07-30 00:11:26,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.46 | bwd_microstep: 4985.33 | bwd_inner_microstep: 4965.93 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-30 00:11:35,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3597.68 | bwd_microstep: 5140.33 | bwd_inner_microstep: 5086.28 | bwd_allreduce_microstep: 53.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3712 [2024-07-30 00:11:44,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3687.32 | bwd_microstep: 4903.00 | bwd_inner_microstep: 4883.55 | bwd_allreduce_microstep: 19.38 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3675 [2024-07-30 00:11:52,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3690.84 | bwd_microstep: 4882.36 | bwd_inner_microstep: 4863.00 | bwd_allreduce_microstep: 19.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3682 [2024-07-30 00:12:01,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-30 00:12:01,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3585.38 | bwd_microstep: 5061.35 | bwd_inner_microstep: 5003.88 | bwd_allreduce_microstep: 57.41 | step_microstep: 183.64 [2024-07-30 00:12:01,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29109.83 | bwd: 40583.01 | bwd_inner: 39873.11 | bwd_allreduce: 709.43 | step: 184.32 95%|█████████▌| 639/671 [12:28:47<37:07, 69.61s/it] {'loss': 1.1887, 'learning_rate': 1.1936502771783488e-07, 'epoch': 0.95} 95%|█████████▌| 639/671 [12:28:47<37:07, 69.61s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2012 [2024-07-30 00:12:10,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.03 | bwd_microstep: 5324.18 | bwd_inner_microstep: 4914.26 | bwd_allreduce_microstep: 409.86 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3852 [2024-07-30 00:12:19,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3661.98 | bwd_microstep: 5286.07 | bwd_inner_microstep: 5222.53 | bwd_allreduce_microstep: 63.47 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3777 [2024-07-30 00:12:28,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.16 | bwd_microstep: 5021.58 | bwd_inner_microstep: 4988.45 | bwd_allreduce_microstep: 33.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3668 [2024-07-30 00:12:37,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.27 | bwd_microstep: 5169.50 | bwd_inner_microstep: 5094.19 | bwd_allreduce_microstep: 75.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2181 [2024-07-30 00:12:45,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.72 | bwd_microstep: 5099.19 | bwd_inner_microstep: 4704.13 | bwd_allreduce_microstep: 394.99 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3738 [2024-07-30 00:12:54,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.92 | bwd_microstep: 4987.81 | bwd_inner_microstep: 4968.47 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3707 [2024-07-30 00:13:03,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3563.94 | bwd_microstep: 5058.80 | bwd_inner_microstep: 4999.05 | bwd_allreduce_microstep: 59.68 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-30 00:13:12,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-30 00:13:12,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.21 | bwd_microstep: 4985.00 | bwd_inner_microstep: 4965.58 | bwd_allreduce_microstep: 19.35 | step_microstep: 182.24 [2024-07-30 00:13:12,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28980.13 | bwd: 40932.11 | bwd_inner: 39856.60 | bwd_allreduce: 1075.02 | step: 182.81 95%|█████████▌| 640/671 [12:29:57<36:03, 69.80s/it] {'loss': 1.0997, 'learning_rate': 1.1203502534608113e-07, 'epoch': 0.95} 95%|█████████▌| 640/671 [12:29:57<36:03, 69.80s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2006 [2024-07-30 00:13:20,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3105.36 | bwd_microstep: 5226.07 | bwd_inner_microstep: 4826.92 | bwd_allreduce_microstep: 399.08 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3790 [2024-07-30 00:13:29,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3736.77 | bwd_microstep: 5052.13 | bwd_inner_microstep: 5027.12 | bwd_allreduce_microstep: 24.95 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2233 [2024-07-30 00:13:37,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3051.31 | bwd_microstep: 5047.54 | bwd_inner_microstep: 4660.24 | bwd_allreduce_microstep: 387.23 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2089 [2024-07-30 00:13:46,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3554.27 | bwd_microstep: 5217.63 | bwd_inner_microstep: 4812.01 | bwd_allreduce_microstep: 405.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-30 00:13:54,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.15 | bwd_microstep: 5024.28 | bwd_inner_microstep: 4985.71 | bwd_allreduce_microstep: 38.50 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-30 00:14:03,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3745.61 | bwd_microstep: 4982.74 | bwd_inner_microstep: 4963.41 | bwd_allreduce_microstep: 19.26 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-30 00:14:12,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.13 | bwd_microstep: 5067.20 | bwd_inner_microstep: 5006.36 | bwd_allreduce_microstep: 60.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3672 [2024-07-30 00:14:20,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-30 00:14:20,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3190.68 | bwd_microstep: 4714.66 | bwd_inner_microstep: 4690.70 | bwd_allreduce_microstep: 23.89 | step_microstep: 207.81 [2024-07-30 00:14:20,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27553.19 | bwd: 40332.24 | bwd_inner: 38972.42 | bwd_allreduce: 1359.34 | step: 208.38 96%|█████████▌| 641/671 [12:31:06<34:39, 69.33s/it] {'loss': 1.0907, 'learning_rate': 1.0493600548948879e-07, 'epoch': 0.95} 96%|█████████▌| 641/671 [12:31:06<34:39, 69.33s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3913 [2024-07-30 00:14:29,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3796.37 | bwd_microstep: 5197.52 | bwd_inner_microstep: 5178.32 | bwd_allreduce_microstep: 19.13 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3581 [2024-07-30 00:14:38,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3596.43 | bwd_microstep: 5162.56 | bwd_inner_microstep: 5079.76 | bwd_allreduce_microstep: 82.73 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3749 [2024-07-30 00:14:46,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.29 | bwd_microstep: 4997.90 | bwd_inner_microstep: 4978.52 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.07 dynamic ViT batch size: 17, images per sample: 8.5, dynamic token length: 3637 [2024-07-30 00:14:54,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3111.97 | bwd_microstep: 4975.97 | bwd_inner_microstep: 4918.60 | bwd_allreduce_microstep: 57.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3703 [2024-07-30 00:15:03,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.04 | bwd_microstep: 4968.74 | bwd_inner_microstep: 4936.02 | bwd_allreduce_microstep: 32.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3720 [2024-07-30 00:15:11,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3263.96 | bwd_microstep: 4827.39 | bwd_inner_microstep: 4802.67 | bwd_allreduce_microstep: 24.65 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2188 [2024-07-30 00:15:20,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3533.45 | bwd_microstep: 5121.58 | bwd_inner_microstep: 4723.89 | bwd_allreduce_microstep: 397.62 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3706 [2024-07-30 00:15:29,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-30 00:15:29,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.38 | bwd_microstep: 5161.67 | bwd_inner_microstep: 5089.66 | bwd_allreduce_microstep: 71.95 | step_microstep: 181.56 [2024-07-30 00:15:29,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28380.80 | bwd: 40413.31 | bwd_inner: 39707.39 | bwd_allreduce: 705.45 | step: 182.15 96%|█████████▌| 642/671 [12:32:15<33:28, 69.27s/it] {'loss': 1.1638, 'learning_rate': 9.806813398091419e-08, 'epoch': 0.96} 96%|█████████▌| 642/671 [12:32:15<33:28, 69.27s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3581 [2024-07-30 00:15:38,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.14 | bwd_microstep: 5184.03 | bwd_inner_microstep: 5095.90 | bwd_allreduce_microstep: 88.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3564 [2024-07-30 00:15:46,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3200.24 | bwd_microstep: 4828.41 | bwd_inner_microstep: 4778.24 | bwd_allreduce_microstep: 50.10 | step_microstep: 0.19 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3616 [2024-07-30 00:15:55,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.01 | bwd_microstep: 5141.05 | bwd_inner_microstep: 5064.28 | bwd_allreduce_microstep: 76.71 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-30 00:16:03,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3202.74 | bwd_microstep: 4796.77 | bwd_inner_microstep: 4758.74 | bwd_allreduce_microstep: 37.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-30 00:16:11,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.85 | bwd_microstep: 4979.56 | bwd_inner_microstep: 4960.20 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3714 [2024-07-30 00:16:20,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.65 | bwd_microstep: 5062.16 | bwd_inner_microstep: 5020.95 | bwd_allreduce_microstep: 41.12 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3743 [2024-07-30 00:16:29,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.54 | bwd_microstep: 4993.91 | bwd_inner_microstep: 4974.54 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-30 00:16:38,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.59 [2024-07-30 00:16:38,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3685.12 | bwd_microstep: 4941.58 | bwd_inner_microstep: 4891.21 | bwd_allreduce_microstep: 50.31 | step_microstep: 180.65 [2024-07-30 00:16:38,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28364.21 | bwd: 39927.44 | bwd_inner: 39544.01 | bwd_allreduce: 382.94 | step: 181.34 96%|█████████▌| 643/671 [12:33:23<32:14, 69.08s/it] {'loss': 1.1088, 'learning_rate': 9.143157125359403e-08, 'epoch': 0.96} 96%|█████████▌| 643/671 [12:33:23<32:14, 69.08s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3532 [2024-07-30 00:16:47,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3678.90 | bwd_microstep: 5356.47 | bwd_inner_microstep: 5187.43 | bwd_allreduce_microstep: 168.96 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3799 [2024-07-30 00:16:55,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.19 | bwd_microstep: 5033.97 | bwd_inner_microstep: 5014.68 | bwd_allreduce_microstep: 19.23 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2224 [2024-07-30 00:17:03,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3035.18 | bwd_microstep: 4996.18 | bwd_inner_microstep: 4610.64 | bwd_allreduce_microstep: 385.48 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3755 [2024-07-30 00:17:12,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3120.83 | bwd_microstep: 4997.07 | bwd_inner_microstep: 4952.94 | bwd_allreduce_microstep: 44.06 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3726 [2024-07-30 00:17:20,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.36 | bwd_microstep: 4987.14 | bwd_inner_microstep: 4967.78 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-30 00:17:29,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3716.73 | bwd_microstep: 4958.76 | bwd_inner_microstep: 4928.19 | bwd_allreduce_microstep: 30.51 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2157 [2024-07-30 00:17:38,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.34 | bwd_microstep: 5119.10 | bwd_inner_microstep: 4721.73 | bwd_allreduce_microstep: 397.30 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3679 [2024-07-30 00:17:46,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-30 00:17:46,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3662.56 | bwd_microstep: 4884.93 | bwd_inner_microstep: 4865.52 | bwd_allreduce_microstep: 19.35 | step_microstep: 181.22 [2024-07-30 00:17:46,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28223.00 | bwd: 40333.60 | bwd_inner: 39248.86 | bwd_allreduce: 1084.27 | step: 181.78 96%|█████████▌| 644/671 [12:34:32<31:03, 69.02s/it] {'loss': 1.069, 'learning_rate': 8.502647233740169e-08, 'epoch': 0.96} 96%|█████████▌| 644/671 [12:34:32<31:03, 69.02s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3749 [2024-07-30 00:17:55,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3292.32 | bwd_microstep: 4954.14 | bwd_inner_microstep: 4921.27 | bwd_allreduce_microstep: 32.81 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3828 [2024-07-30 00:18:04,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3784.43 | bwd_microstep: 5105.35 | bwd_inner_microstep: 5079.98 | bwd_allreduce_microstep: 25.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3652 [2024-07-30 00:18:12,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.75 | bwd_microstep: 5204.12 | bwd_inner_microstep: 5125.68 | bwd_allreduce_microstep: 78.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3767 [2024-07-30 00:18:21,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.39 | bwd_microstep: 5122.93 | bwd_inner_microstep: 5075.23 | bwd_allreduce_microstep: 47.64 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3762 [2024-07-30 00:18:30,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.14 | bwd_microstep: 5151.84 | bwd_inner_microstep: 5103.83 | bwd_allreduce_microstep: 47.94 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-30 00:18:39,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3723.97 | bwd_microstep: 5003.42 | bwd_inner_microstep: 4984.06 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3703 [2024-07-30 00:18:47,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3211.92 | bwd_microstep: 4762.95 | bwd_inner_microstep: 4735.54 | bwd_allreduce_microstep: 27.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-30 00:18:55,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-30 00:18:55,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3197.77 | bwd_microstep: 4717.74 | bwd_inner_microstep: 4694.46 | bwd_allreduce_microstep: 23.20 | step_microstep: 181.08 [2024-07-30 00:18:55,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28014.60 | bwd: 40022.46 | bwd_inner: 39720.01 | bwd_allreduce: 301.99 | step: 181.64 96%|█████████▌| 645/671 [12:35:41<29:49, 68.82s/it] {'loss': 1.1003, 'learning_rate': 7.885298685522235e-08, 'epoch': 0.96} 96%|█████████▌| 645/671 [12:35:41<29:49, 68.82s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2420 [2024-07-30 00:19:04,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.19 | bwd_microstep: 5369.72 | bwd_inner_microstep: 4957.57 | bwd_allreduce_microstep: 412.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2335 [2024-07-30 00:19:13,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3546.90 | bwd_microstep: 5204.49 | bwd_inner_microstep: 4799.99 | bwd_allreduce_microstep: 404.43 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3777 [2024-07-30 00:19:21,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3758.06 | bwd_microstep: 5091.67 | bwd_inner_microstep: 5063.27 | bwd_allreduce_microstep: 28.33 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-30 00:19:30,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3626.91 | bwd_microstep: 5179.12 | bwd_inner_microstep: 5122.57 | bwd_allreduce_microstep: 56.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-30 00:19:39,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.78 | bwd_microstep: 5226.05 | bwd_inner_microstep: 4821.32 | bwd_allreduce_microstep: 404.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3730 [2024-07-30 00:19:48,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.16 | bwd_microstep: 5129.82 | bwd_inner_microstep: 5077.75 | bwd_allreduce_microstep: 52.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3719 [2024-07-30 00:19:56,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3094.93 | bwd_microstep: 4915.29 | bwd_inner_microstep: 4874.62 | bwd_allreduce_microstep: 40.60 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3731 [2024-07-30 00:20:05,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.69 [2024-07-30 00:20:05,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3765.82 | bwd_microstep: 5026.03 | bwd_inner_microstep: 5000.54 | bwd_allreduce_microstep: 25.42 | step_microstep: 181.14 [2024-07-30 00:20:05,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28571.65 | bwd: 41142.17 | bwd_inner: 39717.57 | bwd_allreduce: 1424.14 | step: 181.73 96%|█████████▋| 646/671 [12:36:51<28:49, 69.19s/it] {'loss': 1.1347, 'learning_rate': 7.291125901946027e-08, 'epoch': 0.96} 96%|█████████▋| 646/671 [12:36:51<28:49, 69.19s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3880 [2024-07-30 00:20:14,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3654.26 | bwd_microstep: 5154.55 | bwd_inner_microstep: 5117.20 | bwd_allreduce_microstep: 37.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3810 [2024-07-30 00:20:23,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3784.22 | bwd_microstep: 5084.29 | bwd_inner_microstep: 5060.18 | bwd_allreduce_microstep: 24.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-30 00:20:31,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3214.68 | bwd_microstep: 4824.22 | bwd_inner_microstep: 4787.95 | bwd_allreduce_microstep: 36.20 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3774 [2024-07-30 00:20:39,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3774.46 | bwd_microstep: 5028.33 | bwd_inner_microstep: 5008.87 | bwd_allreduce_microstep: 19.39 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3716 [2024-07-30 00:20:48,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.37 | bwd_microstep: 5188.76 | bwd_inner_microstep: 5131.85 | bwd_allreduce_microstep: 56.84 | step_microstep: 0.18 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2190 [2024-07-30 00:20:57,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.68 | bwd_microstep: 5050.66 | bwd_inner_microstep: 4656.54 | bwd_allreduce_microstep: 394.05 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2120 [2024-07-30 00:21:05,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.29 | bwd_microstep: 5132.75 | bwd_inner_microstep: 4735.79 | bwd_allreduce_microstep: 396.90 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-30 00:21:14,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-30 00:21:14,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3498.70 | bwd_microstep: 5109.51 | bwd_inner_microstep: 4711.74 | bwd_allreduce_microstep: 397.71 | step_microstep: 182.41 [2024-07-30 00:21:14,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28525.55 | bwd: 40573.06 | bwd_inner: 39210.06 | bwd_allreduce: 1362.51 | step: 183.08 96%|█████████▋| 647/671 [12:38:00<27:42, 69.26s/it] {'loss': 1.1776, 'learning_rate': 6.720142762867032e-08, 'epoch': 0.96} 96%|█████████▋| 647/671 [12:38:00<27:42, 69.26s/it]dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3613 [2024-07-30 00:21:23,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.98 | bwd_microstep: 5262.09 | bwd_inner_microstep: 5168.72 | bwd_allreduce_microstep: 93.30 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2334 [2024-07-30 00:21:32,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3558.73 | bwd_microstep: 5244.18 | bwd_inner_microstep: 4837.37 | bwd_allreduce_microstep: 406.75 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-30 00:21:41,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.35 | bwd_microstep: 5119.83 | bwd_inner_microstep: 5090.20 | bwd_allreduce_microstep: 29.56 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2159 [2024-07-30 00:21:50,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.49 | bwd_microstep: 5262.63 | bwd_inner_microstep: 4852.68 | bwd_allreduce_microstep: 409.89 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3727 [2024-07-30 00:21:58,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.23 | bwd_microstep: 5099.19 | bwd_inner_microstep: 5055.34 | bwd_allreduce_microstep: 43.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-07-30 00:22:07,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.15 | bwd_microstep: 5153.11 | bwd_inner_microstep: 5098.83 | bwd_allreduce_microstep: 54.21 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 3.0, dynamic token length: 1098 [2024-07-30 00:22:15,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 2972.27 | bwd_microstep: 4947.26 | bwd_inner_microstep: 4566.25 | bwd_allreduce_microstep: 380.95 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3687 [2024-07-30 00:22:24,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.63 [2024-07-30 00:22:24,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.48 | bwd_microstep: 4918.22 | bwd_inner_microstep: 4893.14 | bwd_allreduce_microstep: 25.01 | step_microstep: 181.07 [2024-07-30 00:22:24,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28395.57 | bwd: 41006.49 | bwd_inner: 39562.47 | bwd_allreduce: 1443.56 | step: 181.66 97%|█████████▋| 648/671 [12:39:10<26:36, 69.40s/it] {'loss': 1.1104, 'learning_rate': 6.172362606431281e-08, 'epoch': 0.96} 97%|█████████▋| 648/671 [12:39:10<26:36, 69.40s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3847 [2024-07-30 00:22:33,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3839.18 | bwd_microstep: 5187.04 | bwd_inner_microstep: 5156.80 | bwd_allreduce_microstep: 30.17 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2285 [2024-07-30 00:22:42,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3567.00 | bwd_microstep: 5246.28 | bwd_inner_microstep: 4838.06 | bwd_allreduce_microstep: 408.15 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2227 [2024-07-30 00:22:50,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3065.68 | bwd_microstep: 5062.54 | bwd_inner_microstep: 4671.91 | bwd_allreduce_microstep: 390.56 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2119 [2024-07-30 00:22:59,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3535.29 | bwd_microstep: 5168.34 | bwd_inner_microstep: 4767.20 | bwd_allreduce_microstep: 401.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3142 [2024-07-30 00:23:07,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3592.80 | bwd_microstep: 5177.50 | bwd_inner_microstep: 4900.55 | bwd_allreduce_microstep: 276.88 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2173 [2024-07-30 00:23:16,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3518.11 | bwd_microstep: 5119.00 | bwd_inner_microstep: 4723.30 | bwd_allreduce_microstep: 395.63 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2155 [2024-07-30 00:23:24,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3018.48 | bwd_microstep: 4913.46 | bwd_inner_microstep: 4533.98 | bwd_allreduce_microstep: 379.42 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3704 [2024-07-30 00:23:33,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.62 [2024-07-30 00:23:33,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.86 | bwd_microstep: 5150.87 | bwd_inner_microstep: 5077.40 | bwd_allreduce_microstep: 73.40 | step_microstep: 181.30 [2024-07-30 00:23:33,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27750.29 | bwd: 41025.01 | bwd_inner: 38669.14 | bwd_allreduce: 2355.40 | step: 181.87 97%|█████████▋| 649/671 [12:40:19<25:24, 69.31s/it] {'loss': 1.1623, 'learning_rate': 5.647798228764156e-08, 'epoch': 0.97} 97%|█████████▋| 649/671 [12:40:19<25:24, 69.31s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-30 00:23:42,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3668.06 | bwd_microstep: 5331.39 | bwd_inner_microstep: 5263.67 | bwd_allreduce_microstep: 67.65 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3575 [2024-07-30 00:23:50,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3129.79 | bwd_microstep: 5042.08 | bwd_inner_microstep: 4964.61 | bwd_allreduce_microstep: 77.41 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2234 [2024-07-30 00:23:59,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.63 | bwd_microstep: 5178.69 | bwd_inner_microstep: 4776.28 | bwd_allreduce_microstep: 402.34 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3755 [2024-07-30 00:24:08,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3601.60 | bwd_microstep: 5153.75 | bwd_inner_microstep: 5102.49 | bwd_allreduce_microstep: 51.19 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3654 [2024-07-30 00:24:17,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3610.96 | bwd_microstep: 5127.52 | bwd_inner_microstep: 5038.80 | bwd_allreduce_microstep: 88.65 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3741 [2024-07-30 00:24:25,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3704.93 | bwd_microstep: 4985.87 | bwd_inner_microstep: 4966.54 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2135 [2024-07-30 00:24:34,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3474.26 | bwd_microstep: 5055.60 | bwd_inner_microstep: 4663.48 | bwd_allreduce_microstep: 392.06 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3684 [2024-07-30 00:24:43,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-30 00:24:43,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3616.64 | bwd_microstep: 5163.71 | bwd_inner_microstep: 5086.03 | bwd_allreduce_microstep: 77.61 | step_microstep: 180.73 [2024-07-30 00:24:43,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28319.78 | bwd: 41038.59 | bwd_inner: 39861.85 | bwd_allreduce: 1176.28 | step: 181.31 97%|█████████▋| 650/671 [12:41:29<24:17, 69.42s/it] {'loss': 1.0729, 'learning_rate': 5.146461883671072e-08, 'epoch': 0.97} 97%|█████████▋| 650/671 [12:41:29<24:17, 69.42s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2348 [2024-07-30 00:24:52,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.77 | bwd_microstep: 5344.46 | bwd_inner_microstep: 4933.15 | bwd_allreduce_microstep: 411.24 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3876 [2024-07-30 00:25:01,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.16 | bwd_microstep: 5114.13 | bwd_inner_microstep: 5094.75 | bwd_allreduce_microstep: 19.31 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2238 [2024-07-30 00:25:09,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.65 | bwd_microstep: 5193.37 | bwd_inner_microstep: 4791.03 | bwd_allreduce_microstep: 402.27 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3622 [2024-07-30 00:25:18,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.54 | bwd_microstep: 5165.25 | bwd_inner_microstep: 5072.60 | bwd_allreduce_microstep: 92.59 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-30 00:25:27,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3564.11 | bwd_microstep: 5032.62 | bwd_inner_microstep: 4969.70 | bwd_allreduce_microstep: 62.86 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2161 [2024-07-30 00:25:35,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3440.29 | bwd_microstep: 5022.88 | bwd_inner_microstep: 4633.85 | bwd_allreduce_microstep: 388.97 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3729 [2024-07-30 00:25:44,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3727.23 | bwd_microstep: 4999.98 | bwd_inner_microstep: 4980.70 | bwd_allreduce_microstep: 19.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3677 [2024-07-30 00:25:53,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-30 00:25:53,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.68 | bwd_microstep: 4982.15 | bwd_inner_microstep: 4928.26 | bwd_allreduce_microstep: 53.83 | step_microstep: 181.10 [2024-07-30 00:25:53,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28767.34 | bwd: 40854.83 | bwd_inner: 39403.99 | bwd_allreduce: 1450.36 | step: 181.66 97%|█████████▋| 651/671 [12:42:39<23:11, 69.58s/it] {'loss': 1.0913, 'learning_rate': 4.6683652823513725e-08, 'epoch': 0.97} 97%|█████████▋| 651/671 [12:42:39<23:11, 69.58s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2351 [2024-07-30 00:26:02,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3578.26 | bwd_microstep: 5289.78 | bwd_inner_microstep: 4880.98 | bwd_allreduce_microstep: 408.72 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3790 [2024-07-30 00:26:10,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.36 | bwd_microstep: 5039.41 | bwd_inner_microstep: 5020.00 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3797 [2024-07-30 00:26:19,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3744.48 | bwd_microstep: 5064.05 | bwd_inner_microstep: 5044.71 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-30 00:26:28,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3621.52 | bwd_microstep: 5169.36 | bwd_inner_microstep: 5088.93 | bwd_allreduce_microstep: 80.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-30 00:26:37,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3557.62 | bwd_microstep: 5068.99 | bwd_inner_microstep: 5026.29 | bwd_allreduce_microstep: 42.63 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2213 [2024-07-30 00:26:45,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.79 | bwd_microstep: 5208.66 | bwd_inner_microstep: 4802.13 | bwd_allreduce_microstep: 406.47 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3676 [2024-07-30 00:26:54,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.56 | bwd_microstep: 5188.84 | bwd_inner_microstep: 5110.02 | bwd_allreduce_microstep: 78.75 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3689 [2024-07-30 00:27:03,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-30 00:27:03,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.72 | bwd_microstep: 5183.09 | bwd_inner_microstep: 5106.61 | bwd_allreduce_microstep: 76.41 | step_microstep: 180.91 [2024-07-30 00:27:03,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29029.21 | bwd: 41212.15 | bwd_inner: 40079.62 | bwd_allreduce: 1132.06 | step: 181.51 97%|█████████▋| 652/671 [12:43:49<22:07, 69.88s/it] {'loss': 1.149, 'learning_rate': 4.2135195931249925e-08, 'epoch': 0.97} 97%|█████████▋| 652/671 [12:43:49<22:07, 69.88s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2364 [2024-07-30 00:27:12,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3134.77 | bwd_microstep: 5211.32 | bwd_inner_microstep: 4811.62 | bwd_allreduce_microstep: 399.63 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2061 [2024-07-30 00:27:20,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3529.46 | bwd_microstep: 5240.87 | bwd_inner_microstep: 4834.23 | bwd_allreduce_microstep: 406.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3773 [2024-07-30 00:27:29,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.61 | bwd_microstep: 5115.58 | bwd_inner_microstep: 5069.49 | bwd_allreduce_microstep: 46.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3752 [2024-07-30 00:27:38,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3634.24 | bwd_microstep: 5211.00 | bwd_inner_microstep: 5150.76 | bwd_allreduce_microstep: 60.18 | step_microstep: 0.18 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3653 [2024-07-30 00:27:47,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3562.48 | bwd_microstep: 5097.75 | bwd_inner_microstep: 5035.30 | bwd_allreduce_microstep: 62.38 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3658 [2024-07-30 00:27:55,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3619.88 | bwd_microstep: 5139.71 | bwd_inner_microstep: 5066.14 | bwd_allreduce_microstep: 73.51 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3673 [2024-07-30 00:28:04,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3595.16 | bwd_microstep: 5135.80 | bwd_inner_microstep: 5064.53 | bwd_allreduce_microstep: 71.21 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2153 [2024-07-30 00:28:13,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.58 [2024-07-30 00:28:13,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3507.10 | bwd_microstep: 5150.34 | bwd_inner_microstep: 4750.73 | bwd_allreduce_microstep: 399.55 | step_microstep: 182.44 [2024-07-30 00:28:13,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28163.61 | bwd: 41302.36 | bwd_inner: 39782.74 | bwd_allreduce: 1519.15 | step: 183.10 97%|█████████▋| 653/671 [12:44:59<20:57, 69.85s/it] {'loss': 1.0957, 'learning_rate': 3.781935441171225e-08, 'epoch': 0.97} 97%|█████████▋| 653/671 [12:44:59<20:57, 69.85s/it]dynamic ViT batch size: 24, images per sample: 12.0, dynamic token length: 3677 [2024-07-30 00:28:22,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3701.23 | bwd_microstep: 5272.96 | bwd_inner_microstep: 5204.01 | bwd_allreduce_microstep: 68.88 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2252 [2024-07-30 00:28:30,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3037.94 | bwd_microstep: 4950.56 | bwd_inner_microstep: 4570.93 | bwd_allreduce_microstep: 379.57 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-07-30 00:28:39,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3521.31 | bwd_microstep: 5168.43 | bwd_inner_microstep: 4766.75 | bwd_allreduce_microstep: 401.61 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3751 [2024-07-30 00:28:48,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3720.59 | bwd_microstep: 4996.24 | bwd_inner_microstep: 4976.81 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3728 [2024-07-30 00:28:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3224.39 | bwd_microstep: 4804.99 | bwd_inner_microstep: 4785.64 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3675 [2024-07-30 00:29:04,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.69 | bwd_microstep: 5087.44 | bwd_inner_microstep: 5024.27 | bwd_allreduce_microstep: 63.10 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3690 [2024-07-30 00:29:13,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.69 | bwd_microstep: 5044.45 | bwd_inner_microstep: 4984.94 | bwd_allreduce_microstep: 59.44 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3660 [2024-07-30 00:29:22,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-30 00:29:22,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3575.74 | bwd_microstep: 5041.44 | bwd_inner_microstep: 4985.97 | bwd_allreduce_microstep: 55.40 | step_microstep: 180.61 [2024-07-30 00:29:22,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27934.47 | bwd: 40366.48 | bwd_inner: 39299.26 | bwd_allreduce: 1066.75 | step: 181.20 97%|█████████▋| 654/671 [12:46:08<19:41, 69.49s/it] {'loss': 1.0875, 'learning_rate': 3.373622908280916e-08, 'epoch': 0.97} 97%|█████████▋| 654/671 [12:46:08<19:41, 69.49s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2400 [2024-07-30 00:29:31,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3660.38 | bwd_microstep: 5524.07 | bwd_inner_microstep: 5099.37 | bwd_allreduce_microstep: 424.63 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3859 [2024-07-30 00:29:40,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3635.23 | bwd_microstep: 5232.36 | bwd_inner_microstep: 5180.05 | bwd_allreduce_microstep: 52.25 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3613 [2024-07-30 00:29:48,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3566.03 | bwd_microstep: 5106.58 | bwd_inner_microstep: 5040.14 | bwd_allreduce_microstep: 66.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3772 [2024-07-30 00:29:57,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3747.32 | bwd_microstep: 5016.55 | bwd_inner_microstep: 4997.20 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3732 [2024-07-30 00:30:06,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.45 | bwd_microstep: 5030.48 | bwd_inner_microstep: 4991.60 | bwd_allreduce_microstep: 38.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3753 [2024-07-30 00:30:14,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3205.62 | bwd_microstep: 4810.59 | bwd_inner_microstep: 4791.23 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3725 [2024-07-30 00:30:23,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.14 | bwd_microstep: 4999.19 | bwd_inner_microstep: 4979.84 | bwd_allreduce_microstep: 19.28 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3786 [2024-07-30 00:30:32,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.61 [2024-07-30 00:30:32,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3763.95 | bwd_microstep: 5048.35 | bwd_inner_microstep: 5029.03 | bwd_allreduce_microstep: 19.26 | step_microstep: 180.67 [2024-07-30 00:30:32,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28884.02 | bwd: 40768.16 | bwd_inner: 40108.42 | bwd_allreduce: 659.27 | step: 181.25 98%|█████████▊| 655/671 [12:47:18<18:34, 69.63s/it] {'loss': 1.1031, 'learning_rate': 2.988591532620322e-08, 'epoch': 0.97} 98%|█████████▊| 655/671 [12:47:18<18:34, 69.63s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3899 [2024-07-30 00:30:41,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3817.57 | bwd_microstep: 5184.63 | bwd_inner_microstep: 5160.08 | bwd_allreduce_microstep: 24.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2253 [2024-07-30 00:30:49,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3059.24 | bwd_microstep: 5079.85 | bwd_inner_microstep: 4688.49 | bwd_allreduce_microstep: 391.29 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2288 [2024-07-30 00:30:58,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3541.01 | bwd_microstep: 5194.02 | bwd_inner_microstep: 4790.08 | bwd_allreduce_microstep: 403.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3754 [2024-07-30 00:31:06,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3583.26 | bwd_microstep: 5098.35 | bwd_inner_microstep: 5052.79 | bwd_allreduce_microstep: 45.49 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3629 [2024-07-30 00:31:15,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.42 | bwd_microstep: 5077.82 | bwd_inner_microstep: 4997.54 | bwd_allreduce_microstep: 80.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3704 [2024-07-30 00:31:24,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3726.11 | bwd_microstep: 5023.58 | bwd_inner_microstep: 4983.31 | bwd_allreduce_microstep: 40.20 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3700 [2024-07-30 00:31:32,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3107.19 | bwd_microstep: 4831.76 | bwd_inner_microstep: 4793.60 | bwd_allreduce_microstep: 38.09 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3657 [2024-07-30 00:31:40,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.51 [2024-07-30 00:31:40,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3146.68 | bwd_microstep: 4789.71 | bwd_inner_microstep: 4756.83 | bwd_allreduce_microstep: 32.80 | step_microstep: 180.42 [2024-07-30 00:31:40,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27509.39 | bwd: 40279.70 | bwd_inner: 39222.67 | bwd_allreduce: 1056.56 | step: 180.98 98%|█████████▊| 656/671 [12:48:26<17:17, 69.18s/it] {'loss': 1.1561, 'learning_rate': 2.6268503085089547e-08, 'epoch': 0.98} 98%|█████████▊| 656/671 [12:48:26<17:17, 69.18s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3751 [2024-07-30 00:31:49,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3700.00 | bwd_microstep: 5430.16 | bwd_inner_microstep: 5346.54 | bwd_allreduce_microstep: 83.54 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3827 [2024-07-30 00:31:58,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3780.85 | bwd_microstep: 5080.09 | bwd_inner_microstep: 5055.65 | bwd_allreduce_microstep: 24.38 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2264 [2024-07-30 00:32:07,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3539.37 | bwd_microstep: 5207.88 | bwd_inner_microstep: 4804.72 | bwd_allreduce_microstep: 403.09 | step_microstep: 0.08 dynamic ViT batch size: 4, images per sample: 2.0, dynamic token length: 1181 [2024-07-30 00:32:15,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3472.67 | bwd_microstep: 5194.29 | bwd_inner_microstep: 4794.51 | bwd_allreduce_microstep: 399.72 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2247 [2024-07-30 00:32:24,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3495.60 | bwd_microstep: 5083.05 | bwd_inner_microstep: 4688.54 | bwd_allreduce_microstep: 394.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3667 [2024-07-30 00:32:33,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.72 | bwd_microstep: 5155.22 | bwd_inner_microstep: 5079.54 | bwd_allreduce_microstep: 75.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3724 [2024-07-30 00:32:41,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3730.85 | bwd_microstep: 4980.98 | bwd_inner_microstep: 4961.66 | bwd_allreduce_microstep: 19.25 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3691 [2024-07-30 00:32:50,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-30 00:32:50,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.48 | bwd_microstep: 5054.51 | bwd_inner_microstep: 4995.96 | bwd_allreduce_microstep: 58.48 | step_microstep: 181.31 [2024-07-30 00:32:50,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28902.44 | bwd: 41186.18 | bwd_inner: 39727.07 | bwd_allreduce: 1458.64 | step: 181.89 98%|█████████▊| 657/671 [12:49:36<16:13, 69.55s/it] {'loss': 1.2111, 'learning_rate': 2.2884076862089712e-08, 'epoch': 0.98} 98%|█████████▊| 657/671 [12:49:36<16:13, 69.55s/it]dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2062 [2024-07-30 00:32:58,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3099.83 | bwd_microstep: 5174.55 | bwd_inner_microstep: 4776.96 | bwd_allreduce_microstep: 397.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3845 [2024-07-30 00:33:07,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.72 | bwd_microstep: 5099.17 | bwd_inner_microstep: 5055.91 | bwd_allreduce_microstep: 43.20 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3602 [2024-07-30 00:33:15,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3201.83 | bwd_microstep: 4792.50 | bwd_inner_microstep: 4753.06 | bwd_allreduce_microstep: 39.37 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2082 [2024-07-30 00:33:24,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3551.84 | bwd_microstep: 5229.10 | bwd_inner_microstep: 4823.39 | bwd_allreduce_microstep: 405.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3786 [2024-07-30 00:33:33,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3572.96 | bwd_microstep: 5122.61 | bwd_inner_microstep: 5078.47 | bwd_allreduce_microstep: 44.08 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3720 [2024-07-30 00:33:42,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.99 | bwd_microstep: 4987.98 | bwd_inner_microstep: 4968.53 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2169 [2024-07-30 00:33:50,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3058.90 | bwd_microstep: 4999.52 | bwd_inner_microstep: 4612.69 | bwd_allreduce_microstep: 386.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3696 [2024-07-30 00:33:58,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.54 [2024-07-30 00:33:58,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3528.24 | bwd_microstep: 4936.29 | bwd_inner_microstep: 4896.68 | bwd_allreduce_microstep: 39.54 | step_microstep: 180.83 [2024-07-30 00:33:58,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27398.20 | bwd: 40341.71 | bwd_inner: 38965.64 | bwd_allreduce: 1375.60 | step: 181.42 98%|█████████▊| 658/671 [12:50:44<14:58, 69.10s/it] {'loss': 1.1655, 'learning_rate': 1.973271571728441e-08, 'epoch': 0.98} 98%|█████████▊| 658/671 [12:50:44<14:58, 69.10s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3908 [2024-07-30 00:34:07,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3790.16 | bwd_microstep: 5163.37 | bwd_inner_microstep: 5144.34 | bwd_allreduce_microstep: 18.96 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2254 [2024-07-30 00:34:16,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.49 | bwd_microstep: 5265.95 | bwd_inner_microstep: 4856.52 | bwd_allreduce_microstep: 409.36 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-30 00:34:25,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3593.93 | bwd_microstep: 5156.33 | bwd_inner_microstep: 5076.59 | bwd_allreduce_microstep: 79.67 | step_microstep: 0.19 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2211 [2024-07-30 00:34:34,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3513.54 | bwd_microstep: 5163.13 | bwd_inner_microstep: 4760.88 | bwd_allreduce_microstep: 402.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3715 [2024-07-30 00:34:42,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.19 | bwd_microstep: 4982.22 | bwd_inner_microstep: 4962.86 | bwd_allreduce_microstep: 19.29 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-30 00:34:51,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3587.65 | bwd_microstep: 5110.58 | bwd_inner_microstep: 5037.92 | bwd_allreduce_microstep: 72.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3692 [2024-07-30 00:35:00,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.23 | bwd_microstep: 5057.76 | bwd_inner_microstep: 5000.58 | bwd_allreduce_microstep: 57.12 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3662 [2024-07-30 00:35:08,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.55 [2024-07-30 00:35:08,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3580.85 | bwd_microstep: 5047.98 | bwd_inner_microstep: 4988.61 | bwd_allreduce_microstep: 59.30 | step_microstep: 180.80 [2024-07-30 00:35:08,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28943.94 | bwd: 40947.29 | bwd_inner: 39828.23 | bwd_allreduce: 1118.59 | step: 181.49 98%|█████████▊| 659/671 [12:51:54<13:53, 69.44s/it] {'loss': 1.1432, 'learning_rate': 1.6814493266357202e-08, 'epoch': 0.98} 98%|█████████▊| 659/671 [12:51:54<13:53, 69.44s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2381 [2024-07-30 00:35:17,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3620.76 | bwd_microstep: 5365.08 | bwd_inner_microstep: 4951.58 | bwd_allreduce_microstep: 413.44 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3835 [2024-07-30 00:35:26,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3773.24 | bwd_microstep: 5039.85 | bwd_inner_microstep: 5020.51 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2223 [2024-07-30 00:35:34,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3047.94 | bwd_microstep: 4998.22 | bwd_inner_microstep: 4609.24 | bwd_allreduce_microstep: 388.92 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2165 [2024-07-30 00:35:43,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3565.29 | bwd_microstep: 5214.52 | bwd_inner_microstep: 4810.69 | bwd_allreduce_microstep: 403.76 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3695 [2024-07-30 00:35:52,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3559.63 | bwd_microstep: 5016.62 | bwd_inner_microstep: 4965.46 | bwd_allreduce_microstep: 51.09 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2152 [2024-07-30 00:36:00,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3054.20 | bwd_microstep: 5017.67 | bwd_inner_microstep: 4631.53 | bwd_allreduce_microstep: 386.08 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2174 [2024-07-30 00:36:08,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3492.06 | bwd_microstep: 5071.31 | bwd_inner_microstep: 4676.25 | bwd_allreduce_microstep: 394.99 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2145 [2024-07-30 00:36:17,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.53 [2024-07-30 00:36:17,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3470.97 | bwd_microstep: 5051.65 | bwd_inner_microstep: 4659.93 | bwd_allreduce_microstep: 391.66 | step_microstep: 181.32 [2024-07-30 00:36:17,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 27583.99 | bwd: 40774.90 | bwd_inner: 38325.14 | bwd_allreduce: 2449.29 | step: 181.90 98%|█████████▊| 660/671 [12:53:03<12:41, 69.21s/it] {'loss': 1.1066, 'learning_rate': 1.4129477678884728e-08, 'epoch': 0.98} 98%|█████████▊| 660/671 [12:53:03<12:41, 69.21s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3925 [2024-07-30 00:36:26,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3333.80 | bwd_microstep: 5005.58 | bwd_inner_microstep: 4979.51 | bwd_allreduce_microstep: 26.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3590 [2024-07-30 00:36:34,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.28 | bwd_microstep: 5162.23 | bwd_inner_microstep: 5083.09 | bwd_allreduce_microstep: 79.07 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3747 [2024-07-30 00:36:43,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3602.46 | bwd_microstep: 5159.27 | bwd_inner_microstep: 5105.61 | bwd_allreduce_microstep: 53.60 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3742 [2024-07-30 00:36:52,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3618.27 | bwd_microstep: 5166.40 | bwd_inner_microstep: 5109.96 | bwd_allreduce_microstep: 56.37 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 5.0, dynamic token length: 2082 [2024-07-30 00:37:01,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3536.41 | bwd_microstep: 5186.91 | bwd_inner_microstep: 4784.53 | bwd_allreduce_microstep: 402.31 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3646 [2024-07-30 00:37:09,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3519.21 | bwd_microstep: 4987.11 | bwd_inner_microstep: 4933.82 | bwd_allreduce_microstep: 53.23 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 9.0, dynamic token length: 3148 [2024-07-30 00:37:18,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3497.30 | bwd_microstep: 4987.21 | bwd_inner_microstep: 4813.30 | bwd_allreduce_microstep: 173.85 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-30 00:37:26,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.44 [2024-07-30 00:37:26,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3590.24 | bwd_microstep: 5051.05 | bwd_inner_microstep: 4992.27 | bwd_allreduce_microstep: 58.72 | step_microstep: 180.36 [2024-07-30 00:37:26,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28283.86 | bwd: 40705.75 | bwd_inner: 39802.02 | bwd_allreduce: 903.25 | step: 180.94 99%|█████████▊| 661/671 [12:54:12<11:32, 69.24s/it] {'loss': 1.1371, 'learning_rate': 1.1677731676734694e-08, 'epoch': 0.98} 99%|█████████▊| 661/671 [12:54:12<11:32, 69.24s/it]dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2099 [2024-07-30 00:37:35,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3107.91 | bwd_microstep: 5185.79 | bwd_inner_microstep: 4787.18 | bwd_allreduce_microstep: 398.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-30 00:37:44,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3623.37 | bwd_microstep: 5225.35 | bwd_inner_microstep: 5142.25 | bwd_allreduce_microstep: 83.03 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2217 [2024-07-30 00:37:52,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3527.84 | bwd_microstep: 5189.04 | bwd_inner_microstep: 4785.20 | bwd_allreduce_microstep: 403.78 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-30 00:38:01,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.63 | bwd_microstep: 5135.43 | bwd_inner_microstep: 5064.12 | bwd_allreduce_microstep: 71.24 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3631 [2024-07-30 00:38:10,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.91 | bwd_microstep: 5108.06 | bwd_inner_microstep: 5033.33 | bwd_allreduce_microstep: 74.66 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3717 [2024-07-30 00:38:19,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3724.34 | bwd_microstep: 5022.09 | bwd_inner_microstep: 5001.72 | bwd_allreduce_microstep: 20.30 | step_microstep: 0.07 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2176 [2024-07-30 00:38:27,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3505.39 | bwd_microstep: 5078.52 | bwd_inner_microstep: 4683.67 | bwd_allreduce_microstep: 394.78 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2167 [2024-07-30 00:38:36,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.48 [2024-07-30 00:38:36,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3494.58 | bwd_microstep: 5112.26 | bwd_inner_microstep: 4712.72 | bwd_allreduce_microstep: 399.47 | step_microstep: 181.21 [2024-07-30 00:38:36,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28154.87 | bwd: 41056.52 | bwd_inner: 39210.14 | bwd_allreduce: 1845.92 | step: 181.77 99%|█████████▊| 662/671 [12:55:22<10:23, 69.33s/it] {'loss': 1.1792, 'learning_rate': 9.459312532608122e-09, 'epoch': 0.99} 99%|█████████▊| 662/671 [12:55:22<10:23, 69.33s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3882 [2024-07-30 00:38:45,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.52 | bwd_microstep: 5176.59 | bwd_inner_microstep: 5139.23 | bwd_allreduce_microstep: 37.30 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3605 [2024-07-30 00:38:54,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3589.92 | bwd_microstep: 5174.80 | bwd_inner_microstep: 5090.59 | bwd_allreduce_microstep: 84.14 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3612 [2024-07-30 00:39:02,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3548.29 | bwd_microstep: 5002.23 | bwd_inner_microstep: 4944.62 | bwd_allreduce_microstep: 57.55 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3745 [2024-07-30 00:39:11,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3628.97 | bwd_microstep: 5161.72 | bwd_inner_microstep: 5108.64 | bwd_allreduce_microstep: 53.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2172 [2024-07-30 00:39:20,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.87 | bwd_microstep: 5150.47 | bwd_inner_microstep: 4748.45 | bwd_allreduce_microstep: 401.95 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3639 [2024-07-30 00:39:28,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3202.24 | bwd_microstep: 4718.69 | bwd_inner_microstep: 4691.17 | bwd_allreduce_microstep: 27.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3695 [2024-07-30 00:39:37,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3737.44 | bwd_microstep: 5028.26 | bwd_inner_microstep: 4990.56 | bwd_allreduce_microstep: 37.64 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2135 [2024-07-30 00:39:45,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.52 [2024-07-30 00:39:45,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3466.56 | bwd_microstep: 5062.30 | bwd_inner_microstep: 4668.90 | bwd_allreduce_microstep: 393.33 | step_microstep: 180.83 [2024-07-30 00:39:45,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28551.74 | bwd: 40475.05 | bwd_inner: 39382.11 | bwd_allreduce: 1092.47 | step: 181.41 99%|█████████▉| 663/671 [12:56:31<09:14, 69.34s/it] {'loss': 1.0863, 'learning_rate': 7.474272068698219e-09, 'epoch': 0.99} 99%|█████████▉| 663/671 [12:56:31<09:14, 69.34s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3901 [2024-07-30 00:39:54,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3790.73 | bwd_microstep: 5152.24 | bwd_inner_microstep: 5133.11 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3837 [2024-07-30 00:40:03,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3609.02 | bwd_microstep: 5136.05 | bwd_inner_microstep: 5093.70 | bwd_allreduce_microstep: 42.29 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3758 [2024-07-30 00:40:12,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3754.13 | bwd_microstep: 5001.78 | bwd_inner_microstep: 4982.37 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3754 [2024-07-30 00:40:21,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3768.29 | bwd_microstep: 5034.55 | bwd_inner_microstep: 5011.67 | bwd_allreduce_microstep: 22.81 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-30 00:40:30,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3568.26 | bwd_microstep: 5253.08 | bwd_inner_microstep: 4845.19 | bwd_allreduce_microstep: 407.82 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2193 [2024-07-30 00:40:38,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3048.45 | bwd_microstep: 4984.00 | bwd_inner_microstep: 4599.49 | bwd_allreduce_microstep: 384.45 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-30 00:40:46,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.49 | bwd_microstep: 4966.85 | bwd_inner_microstep: 4947.44 | bwd_allreduce_microstep: 19.34 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3690 [2024-07-30 00:40:55,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.67 [2024-07-30 00:40:55,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3714.71 | bwd_microstep: 5257.50 | bwd_inner_microstep: 5238.12 | bwd_allreduce_microstep: 19.32 | step_microstep: 180.97 [2024-07-30 00:40:55,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28979.00 | bwd: 40786.05 | bwd_inner: 39851.03 | bwd_allreduce: 934.52 | step: 181.55 99%|█████████▉| 664/671 [12:57:41<08:06, 69.57s/it] {'loss': 1.1954, 'learning_rate': 5.722656655482439e-09, 'epoch': 0.99} 99%|█████████▉| 664/671 [12:57:41<08:06, 69.57s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3928 [2024-07-30 00:41:04,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3650.30 | bwd_microstep: 5248.49 | bwd_inner_microstep: 5201.43 | bwd_allreduce_microstep: 47.00 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3801 [2024-07-30 00:41:13,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.09 | bwd_microstep: 5012.39 | bwd_inner_microstep: 4993.05 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3760 [2024-07-30 00:41:22,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.05 | bwd_microstep: 5155.02 | bwd_inner_microstep: 5102.98 | bwd_allreduce_microstep: 51.98 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3800 [2024-07-30 00:41:31,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.83 | bwd_microstep: 5037.88 | bwd_inner_microstep: 5018.60 | bwd_allreduce_microstep: 19.22 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3744 [2024-07-30 00:41:40,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3750.85 | bwd_microstep: 5035.12 | bwd_inner_microstep: 5008.57 | bwd_allreduce_microstep: 26.48 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2180 [2024-07-30 00:41:48,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3052.93 | bwd_microstep: 5049.27 | bwd_inner_microstep: 4659.40 | bwd_allreduce_microstep: 389.80 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3698 [2024-07-30 00:41:57,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3553.27 | bwd_microstep: 5266.99 | bwd_inner_microstep: 5126.28 | bwd_allreduce_microstep: 140.63 | step_microstep: 0.18 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 2140 [2024-07-30 00:42:05,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-30 00:42:05,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3542.30 | bwd_microstep: 5195.96 | bwd_inner_microstep: 4792.31 | bwd_allreduce_microstep: 403.58 | step_microstep: 180.86 [2024-07-30 00:42:05,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28646.52 | bwd: 41001.09 | bwd_inner: 39902.57 | bwd_allreduce: 1098.05 | step: 181.53 99%|█████████▉| 665/671 [12:58:51<06:58, 69.69s/it] {'loss': 1.1196, 'learning_rate': 4.204507210633368e-09, 'epoch': 0.99} 99%|█████████▉| 665/671 [12:58:51<06:58, 69.69s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3985 [2024-07-30 00:42:15,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3878.48 | bwd_microstep: 5237.99 | bwd_inner_microstep: 5218.87 | bwd_allreduce_microstep: 19.05 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 6.0, dynamic token length: 3304 [2024-07-30 00:42:24,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3646.05 | bwd_microstep: 5334.16 | bwd_inner_microstep: 5037.69 | bwd_allreduce_microstep: 296.40 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3809 [2024-07-30 00:42:32,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3742.78 | bwd_microstep: 5038.86 | bwd_inner_microstep: 5019.47 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2258 [2024-07-30 00:42:41,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3510.14 | bwd_microstep: 5144.55 | bwd_inner_microstep: 4744.87 | bwd_allreduce_microstep: 399.62 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3709 [2024-07-30 00:42:50,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3686.88 | bwd_microstep: 4905.99 | bwd_inner_microstep: 4885.84 | bwd_allreduce_microstep: 20.08 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3633 [2024-07-30 00:42:57,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3154.19 | bwd_microstep: 4646.67 | bwd_inner_microstep: 4627.33 | bwd_allreduce_microstep: 19.27 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2168 [2024-07-30 00:43:06,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3473.18 | bwd_microstep: 5051.33 | bwd_inner_microstep: 4659.23 | bwd_allreduce_microstep: 392.03 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3680 [2024-07-30 00:43:15,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.57 [2024-07-30 00:43:15,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3725.99 | bwd_microstep: 4962.54 | bwd_inner_microstep: 4932.74 | bwd_allreduce_microstep: 29.74 | step_microstep: 181.48 [2024-07-30 00:43:15,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28817.59 | bwd: 40322.08 | bwd_inner: 39125.98 | bwd_allreduce: 1195.61 | step: 182.06 99%|█████████▉| 666/671 [13:00:01<05:48, 69.62s/it] {'loss': 1.1339, 'learning_rate': 2.9198591980705847e-09, 'epoch': 0.99} 99%|█████████▉| 666/671 [13:00:01<05:48, 69.62s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3947 [2024-07-30 00:43:24,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3827.57 | bwd_microstep: 5173.91 | bwd_inner_microstep: 5154.74 | bwd_allreduce_microstep: 19.09 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3593 [2024-07-30 00:43:32,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3242.58 | bwd_microstep: 4910.93 | bwd_inner_microstep: 4854.65 | bwd_allreduce_microstep: 56.21 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3626 [2024-07-30 00:43:41,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3612.80 | bwd_microstep: 5177.49 | bwd_inner_microstep: 5097.77 | bwd_allreduce_microstep: 79.65 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3743 [2024-07-30 00:43:50,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3613.91 | bwd_microstep: 5178.87 | bwd_inner_microstep: 5123.22 | bwd_allreduce_microstep: 55.58 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3735 [2024-07-30 00:43:58,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.01 | bwd_microstep: 5112.30 | bwd_inner_microstep: 5065.57 | bwd_allreduce_microstep: 46.66 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-30 00:44:07,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3591.71 | bwd_microstep: 5079.63 | bwd_inner_microstep: 5031.90 | bwd_allreduce_microstep: 47.66 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3714 [2024-07-30 00:44:16,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3733.77 | bwd_microstep: 5022.15 | bwd_inner_microstep: 4996.15 | bwd_allreduce_microstep: 25.91 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3688 [2024-07-30 00:44:25,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.49 [2024-07-30 00:44:25,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3588.74 | bwd_microstep: 5049.92 | bwd_inner_microstep: 4993.38 | bwd_allreduce_microstep: 56.48 | step_microstep: 181.19 [2024-07-30 00:44:25,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28796.99 | bwd: 40705.16 | bwd_inner: 40317.32 | bwd_allreduce: 387.34 | step: 181.75 99%|█████████▉| 667/671 [13:01:11<04:38, 69.69s/it] {'loss': 1.1454, 'learning_rate': 1.8687426271246646e-09, 'epoch': 0.99} 99%|█████████▉| 667/671 [13:01:11<04:38, 69.69s/it]dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2352 [2024-07-30 00:44:34,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.53 | bwd_microstep: 5340.93 | bwd_inner_microstep: 4931.75 | bwd_allreduce_microstep: 409.12 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3805 [2024-07-30 00:44:43,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3749.89 | bwd_microstep: 5093.52 | bwd_inner_microstep: 5067.94 | bwd_allreduce_microstep: 25.52 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3765 [2024-07-30 00:44:51,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3624.90 | bwd_microstep: 5210.13 | bwd_inner_microstep: 5149.92 | bwd_allreduce_microstep: 60.14 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3741 [2024-07-30 00:44:59,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3087.54 | bwd_microstep: 4937.19 | bwd_inner_microstep: 4894.71 | bwd_allreduce_microstep: 42.40 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2190 [2024-07-30 00:45:08,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3561.66 | bwd_microstep: 5241.47 | bwd_inner_microstep: 4834.12 | bwd_allreduce_microstep: 407.28 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3724 [2024-07-30 00:45:17,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3606.78 | bwd_microstep: 5152.30 | bwd_inner_microstep: 5096.73 | bwd_allreduce_microstep: 55.50 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3715 [2024-07-30 00:45:26,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3574.17 | bwd_microstep: 5066.46 | bwd_inner_microstep: 5025.02 | bwd_allreduce_microstep: 41.37 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 3696 [2024-07-30 00:45:35,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.56 [2024-07-30 00:45:35,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3600.79 | bwd_microstep: 5076.95 | bwd_inner_microstep: 5005.02 | bwd_allreduce_microstep: 71.86 | step_microstep: 181.07 [2024-07-30 00:45:35,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28421.16 | bwd: 41118.92 | bwd_inner: 40005.13 | bwd_allreduce: 1113.31 | step: 181.63 100%|█████████▉| 668/671 [13:02:21<03:29, 69.74s/it] {'loss': 1.1141, 'learning_rate': 1.0511820518432915e-09, 'epoch': 0.99} 100%|█████████▉| 668/671 [13:02:21<03:29, 69.74s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3892 [2024-07-30 00:45:44,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3648.32 | bwd_microstep: 5214.54 | bwd_inner_microstep: 5170.92 | bwd_allreduce_microstep: 43.55 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3584 [2024-07-30 00:45:52,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3586.67 | bwd_microstep: 5170.65 | bwd_inner_microstep: 5081.00 | bwd_allreduce_microstep: 89.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3813 [2024-07-30 00:46:01,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3579.29 | bwd_microstep: 5031.62 | bwd_inner_microstep: 4997.66 | bwd_allreduce_microstep: 33.89 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3780 [2024-07-30 00:46:10,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3759.05 | bwd_microstep: 5030.89 | bwd_inner_microstep: 5007.95 | bwd_allreduce_microstep: 22.87 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3775 [2024-07-30 00:46:18,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3577.14 | bwd_microstep: 5098.28 | bwd_inner_microstep: 5054.40 | bwd_allreduce_microstep: 43.81 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3674 [2024-07-30 00:46:26,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3199.94 | bwd_microstep: 4740.02 | bwd_inner_microstep: 4714.48 | bwd_allreduce_microstep: 25.46 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 8.0, dynamic token length: 3722 [2024-07-30 00:46:35,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.28 | bwd_microstep: 5050.74 | bwd_inner_microstep: 4994.22 | bwd_allreduce_microstep: 56.45 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3723 [2024-07-30 00:46:43,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.65 [2024-07-30 00:46:43,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3242.86 | bwd_microstep: 4799.18 | bwd_inner_microstep: 4779.76 | bwd_allreduce_microstep: 19.35 | step_microstep: 369.82 [2024-07-30 00:46:43,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28166.44 | bwd: 40135.89 | bwd_inner: 39800.34 | bwd_allreduce: 335.07 | step: 370.41 100%|█████████▉| 669/671 [13:03:29<02:18, 69.47s/it] {'loss': 1.2043, 'learning_rate': 4.671965704128312e-10, 'epoch': 1.0} 100%|█████████▉| 669/671 [13:03:29<02:18, 69.47s/it]dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3571 [2024-07-30 00:46:52,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3599.25 | bwd_microstep: 5141.46 | bwd_inner_microstep: 5064.67 | bwd_allreduce_microstep: 76.72 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3570 [2024-07-30 00:47:01,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3581.17 | bwd_microstep: 5158.92 | bwd_inner_microstep: 5073.63 | bwd_allreduce_microstep: 85.23 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3757 [2024-07-30 00:47:10,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3636.20 | bwd_microstep: 5211.52 | bwd_inner_microstep: 5151.88 | bwd_allreduce_microstep: 59.57 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3740 [2024-07-30 00:47:18,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3220.40 | bwd_microstep: 4803.58 | bwd_inner_microstep: 4784.19 | bwd_allreduce_microstep: 19.32 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3624 [2024-07-30 00:47:27,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3573.95 | bwd_microstep: 5102.83 | bwd_inner_microstep: 5031.41 | bwd_allreduce_microstep: 71.35 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3723 [2024-07-30 00:47:35,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3739.80 | bwd_microstep: 4990.95 | bwd_inner_microstep: 4971.51 | bwd_allreduce_microstep: 19.37 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2186 [2024-07-30 00:47:44,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3500.39 | bwd_microstep: 5103.05 | bwd_inner_microstep: 4706.65 | bwd_allreduce_microstep: 396.33 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3705 [2024-07-30 00:47:53,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.50 [2024-07-30 00:47:53,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3540.63 | bwd_microstep: 4961.55 | bwd_inner_microstep: 4919.64 | bwd_allreduce_microstep: 41.84 | step_microstep: 181.17 [2024-07-30 00:47:53,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 28391.68 | bwd: 40473.85 | bwd_inner: 39703.53 | bwd_allreduce: 769.84 | step: 181.74 100%|█████████▉| 670/671 [13:04:39<01:09, 69.38s/it] {'loss': 1.1129, 'learning_rate': 1.167998247131319e-10, 'epoch': 1.0} 100%|█████████▉| 670/671 [13:04:39<01:09, 69.38s/it]dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3969 [2024-07-30 00:48:02,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3852.57 | bwd_microstep: 5275.06 | bwd_inner_microstep: 5255.94 | bwd_allreduce_microstep: 19.06 | step_microstep: 0.08 dynamic ViT batch size: 8, images per sample: 4.0, dynamic token length: 2261 [2024-07-30 00:48:11,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3625.19 | bwd_microstep: 5356.76 | bwd_inner_microstep: 4941.15 | bwd_allreduce_microstep: 415.56 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3635 [2024-07-30 00:48:20,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3615.37 | bwd_microstep: 5220.71 | bwd_inner_microstep: 5134.07 | bwd_allreduce_microstep: 86.58 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 7.0, dynamic token length: 2249 [2024-07-30 00:48:28,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3570.80 | bwd_microstep: 5254.16 | bwd_inner_microstep: 4846.28 | bwd_allreduce_microstep: 407.82 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 10.0, dynamic token length: 3632 [2024-07-30 00:48:37,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3607.98 | bwd_microstep: 5180.16 | bwd_inner_microstep: 5093.91 | bwd_allreduce_microstep: 86.19 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3728 [2024-07-30 00:48:46,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3722.27 | bwd_microstep: 4983.85 | bwd_inner_microstep: 4964.41 | bwd_allreduce_microstep: 19.36 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3669 [2024-07-30 00:48:55,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3729.47 | bwd_microstep: 5052.12 | bwd_inner_microstep: 5008.73 | bwd_allreduce_microstep: 43.31 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 13.0, dynamic token length: 3716 [2024-07-30 00:49:04,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_step: 92.66 [2024-07-30 00:49:04,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3734.63 | bwd_microstep: 5023.12 | bwd_inner_microstep: 4998.60 | bwd_allreduce_microstep: 24.46 | step_microstep: 181.04 [2024-07-30 00:49:04,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 29458.18 | bwd: 41345.92 | bwd_inner: 40243.02 | bwd_allreduce: 1102.43 | step: 181.62 100%|██████████| 671/671 [13:05:50<00:00, 69.91s/it] {'loss': 1.0999, 'learning_rate': 0.0, 'epoch': 1.0} 100%|██████████| 671/671 [13:05:50<00:00, 69.91s/it]petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. petrel_client is not installed. If you read data locally instead of from ceph, ignore it. Replace train sampler!! petrel_client is not installed. Using PIL to load images. [INFO|trainer.py:1962] 2024-07-30 00:49:05,552 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 47161.9361, 'train_samples_per_second': 1.823, 'train_steps_per_second': 0.014, 'train_loss': 1.1755684873563876, 'epoch': 1.0} 100%|██████████| 671/671 [13:05:51<00:00, 69.91s/it] 100%|██████████| 671/671 [13:05:51<00:00, 70.27s/it] [INFO|trainer.py:2936] 2024-07-30 00:49:32,355 >> Saving model checkpoint to /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w [INFO|configuration_utils.py:473] 2024-07-30 00:49:32,357 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/config.json [INFO|configuration_utils.py:594] 2024-07-30 00:49:32,357 >> Configuration saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/generation_config.json [INFO|modeling_utils.py:2501] 2024-07-30 00:50:28,881 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 11 checkpoint shards. You can find where each parameters has been saved in the index located at /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2433] 2024-07-30 00:50:28,883 >> tokenizer config file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-07-30 00:50:28,883 >> Special tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-07-30 00:50:28,883 >> added tokens file saved in /data/jcy/ckpt/internvl-v1_5-finetune-series/caption-10w/added_tokens.json ***** train metrics ***** epoch = 1.0 train_loss = 1.1756 train_runtime = 13:06:01.93 train_samples = 85997 train_samples_per_second = 1.823 train_steps_per_second = 0.014 wandb: - 0.016 MB of 0.016 MB uploaded wandb: \ 0.016 MB of 0.016 MB uploaded wandb: | 0.016 MB of 0.016 MB uploaded wandb: / 0.016 MB of 1.763 MB uploaded wandb: - 0.017 MB of 1.777 MB uploaded wandb: \ 0.869 MB of 1.777 MB uploaded wandb: | 1.777 MB of 1.777 MB uploaded wandb: wandb: Run history: wandb: train/epoch ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: train/global_step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███ wandb: train/learning_rate ▃███████▇▇▇▇▇▆▆▆▆▅▅▅▅▄▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁ wandb: train/loss ▇▆█▇▆▆▅█▄▃▅█▅▃▅▃▅▅▅▅▅▄▄▃▄▃▂▃▄▄▂▄▃▃▄▄▁▂▂▅ wandb: train/total_flos ▁ wandb: train/train_loss ▁ wandb: train/train_runtime ▁ wandb: train/train_samples_per_second ▁ wandb: train/train_steps_per_second ▁ wandb: wandb: Run summary: wandb: train/epoch 1.0 wandb: train/global_step 671 wandb: train/learning_rate 0.0 wandb: train/loss 1.0999 wandb: train/total_flos 2.178765965849998e+19 wandb: train/train_loss 1.17557 wandb: train/train_runtime 47161.9361 wandb: train/train_samples_per_second 1.823 wandb: train/train_steps_per_second 0.014 wandb: wandb: 🚀 View run swept-microwave-27 at: https://wandb.ai/pku_kcl/huggingface/runs/8a7wdzgp wandb: ⭐️ View project at: https://wandb.ai/pku_kcl/huggingface wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20240729_114309-8a7wdzgp/logs